^ National Stage PCT/BE2004/000 1 03 

' Application No. To Be Assigned 



First Preliminary Amendment 



REMARKS 

1 . This is the first Preliminary Amendment filed in this application. 

2. Claims 1-12 were originally presented in this application. By the foregoing 
Amendments, claims 1-7, claims 9-1 1 have been amended, and claim 13 is new. Claims 8 
and 12 have been canceled. Thus, upon entry of this paper, claims 1-7, 9-1 1 and 13 will be 
pending in this application. Of these 1 1 claims, two (2) claims (claims 1 and 9) are 
independent. 

Amendments to the Specification 

3. The specification has been amended to correct grammatical, typographical and 
other minor errors as well as to add a "Cross-Reference to Related Applications" Section. 
Applicants submit herewith as Attachment 1 a substitute specification under 37 C.F.R. 
§1.121(b)(3). 

4. The changes to the specification do not introduce new matter. Accordingly, the 
substitute specification is compliant with 37 C.F.R. §1.1 25(b). 

5. Pursuant to 37 C.F.R. §1.1 25(c), the substitute specification is in clean form 
without markings as to amended material. Enclosed as Attachment 2 is a marked-up 
version of the specification of record showing all changes made to the specification. 
Further, the paragraphs of the substitute specification are individually numbered with 
Arabic numerals so that any fiiture amendment to the specification can be made by 
replacement paragraph in accordance with 37 C.F.R. §1. 121(b)(1). 

Amendments to the Claims 

6. Applicants have amended the claims in accordance to 37 C.F.R. §1.121 (c)( 1 )(i). 
Listing of Attachments 

7. The following documents, referenced above, are provided as attachments to this 
paper: 

Attachment 1 : Substitute Specification 

Attachment 2: Marked Up Version of Specification Showing Changes Made 
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METHOD AND DEVICE FOR NOISE REDUCTION 



CROSS-REFERENCE TO RELATED APPLICATIONS 

[0001] This application is a national stage application under 35 USC §371(c) of PCT 
Application No. PCT/BE2004/000103. entitled "Method and Device for Noise 
Reduction/' filed on July 12. 2004. which claims the priority of Australian Patent No. 
2003903575, filed on July IK 2003, and Australian Patent No. 2004901931. filed on 
April 8, 2004. The entire disclosure and contents of the aboye applications are hereby 
incorporated by reference herein. 

BACKGROUND 

Field of the //ivenrion invcntion 

[00021 f OOftH- The present invention is related to a method and device for 
adaptively reducing the noise in speech communication applications. 

Related Art State of the ar t 

[00031 There are a variety of medical implants which deliver electrical stimulation to 
a patient or recipient ("recipient herein) for a variety of therapeutic benefits. For 
example, the hair cells of the cochlea of a normal healthy ear convert acoustic signals 
into nerve impulses. People who are profoundly deaf due to the absence or destruction 
of cochlea hair cells are unable to derive suitable benefit firom conventional hearing aid 
systems. Prosthetic hearing implant systems have been developed to provide such 
persons with the ability to perceive sound. Prosthetic hearing implant systems bypass 
the hair cells in the cochlea to directly deliver electrical stimulation to auditory nerve 
fibers, thereby allowing the brain to perceive a hearing sensation resembling the natural 
hearing sensation. 

[00041 The electrodes implemented in stimulating medical implants vary according 
to the device and tissue which is to be stimulated. For example, the cochlea is 
tonotopically mapped and partitioned into regions, with each region being responsive to 
stimulus signals in a particular frequency range. To accommodate this property of the 
cochlea, prosthetic hearing implant systems typically include an array of electrodes each 
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constructed and arranged to deliver an appropriate stimulating signal to a particular 
region of the cochlea, 

[00051 To achieve an optimal electrode position close to the inside wall of the 
cochlea, the electrode assembly should assume this desired position upon or 
immediately following implantation into the cochlea. It is also desirable that the 
electrode assembly be shaped such that the insertion process causes minimal trauma to 
the sensitive structures of the cochlea. Usually the electrode assembly is held in a 
straight configuration at least during the initial stages of the insertion procedure, 
conforming to the natural shape of the cochlear once implantation is complete. 

[00061 Prosthetic hearing implant systems typically have two primary components: 
an external component commonly referred to as a speech processor, and an implanted 
component commonly referred to as a receiver/stimulator unit. Traditionally, both of 
these components cooperate with each other to provide sound sensations to a recipient. 

[00071 The extemal component traditionally includes a microphone that detects 
sounds, such as speech and environmental sounds, a speech processor that selects and 
converts certain detected sounds, particularly speech, into a coded signal, a power 
source such as a battery, and an extemal transmitter antenna. 

[00081 The coded signal output by the speech processor is transmitted 
transcutaneously to the implanted receiver/stimulator unit, commonly located within a 
recess of the temporal bone of the recipient. This transcutaneous transmission occurs 
via the extemal transmitter antenna which is positioned to communicate with an 
implanted receiver antenna disposed within the receiver/stimulator unit. This 
communication transmits the coded sound signal while also providing power to the 
implanted receiver/stimulator unit. Conventionally, this link has been in the form of a 
radio frequency (RF) link, but other communication and power links have been 
proposed and implemented with varying degrees of success. 

[00091 The implanted receiver/stimulator unit traditionally includes the noted 
receiver antenna that receives the coded signal and power from the extemal component. 
The implanted unit also includes a stimulator that processes the coded signal and 
outputs an electrical stimulation signal to an intra-cochlea electrode assembly mounted 
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to a carrier memben The electrode assembly typically has a plurality of electrodes that 
apply the electrical stimulation directly to the auditory nerve to produce a hearing 
sensation corresponding to the original detected sound. 

[0002] In apoooh communication applioationo, auoh as tolooonf e r e noing, hands froo 

t e lephony and h e aring aids, tho prooono e of background nois e may oignifioantly reduce 
th e intelligibility of th e d e sir e d speech signal. Hono e , th e use of a nois e r e duction 
algorithm io nocoooory. — Multi microphon e systems e xploit spatial information in 
addition to t e mporal and sp e ctral information of tho desir e d signal and nois e signal and 
are thus preferred to single microphone procedures. Becaus e of a e sth e tic r e asons, multi 
microphon e t e chniqu e s for e .g., hearing aid applications go together with tho use of 
small siz e d arrays. Consid e rabl e nois e r e duction can bo achi e v e d with such arrays, but 
at th e e xpense of an incr e as e d sensitivity to e rrors in tho assumed signal model suoh as 
microphone mismatch, reverberation, ... (see e .g. Stadlcr <fe Rabinowitz, — 'On the 
potential of fixed arrays for hearing aids \ J. Acoust, Soc. Amor., vol. 94, no. 3, pp. 
1332 1342, Sep. 1993) In hearing aids, microphon e s are rarely match e d in gain and 
phas e . Gain and phase differ e nc e s b e tw e en microphone oharaotoristics can amount up to 
6 dB and 10°, respectively. 

f0003] A wid e ly studi e d multi chann e l adaptive noise r e duction algorithm is th e 

Generalised Sidelobe Canceller (GSC) (see o.g. Griffiths & Jim, — *2in alternative 
approach to linearly constrained adaptive beamforming', IEEE Trans. Antennas 
Propag., vol. 30, no. 1, pp. 27 34, Jan. 1982 and US 5473701 'Adaptive microphone 
array The GSC consists of a fix e d, spatial pro processor, which includes a fixed 
b e amform e r and a blocking matrix, and an adaptiv e stag e bas e d on an Adaptiv e Nois e 
Cancollor (ANC). The ANC minimises tho output noise pow e r whilo th e blocking 
matrix should avoid speech leakag e into th e nois e r e f e r e nc e s. Tho standard GSC 
assum e s th e d e sir e d sp e ak e r location, the microphone characteristics and positions to b e 
loiown, and reflections of th e speech signal to bo absent. If th e s e assumptions are 
fulfilled, it provides an undistortod onhanoed speooh signal with minimum r e sidual 
nois e . However, in r e ality th e s e assumptions ar e often violated, resulting in so called 
speech l e alcage and honoe spoooh distortion. To limit speech distortion, th e ANC is 
typically adapted during periods of nois e only. Wh e n used in combination with small 
siz e d arrays, e .g., in hearing aid applications, an additional robustness constraint (soo 
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Cox €t al., 'Robust adaptive bcamforming\ IEEE Trans. Acoust Speech and Signal 
Processing*, vol. 35, no, 10, pp. 1365 1376, Oct. 1987) ia r e quired to guarant e e 
p e rformanco in th e pr e senc e of small orrors in the assumed signal mod e l, such as 
miorophono mismatch. A widely applied method consists of imposing a Quadratic 
Inequality Constraint to the ANC (QIC GSC). For Least Moan Squares (LMS) 
updating, the Scaled Projection Algorithm (SPA) is a simpl e and e ff e ctiv e technique 
that imposes this constraint. How e ver, using th e QIC GSC goes at th e e xp e ns e of l e ss 
noise reduction. 

40004] A Multi channel Wi e ner Filt e ring (MWF) t e chniqu e has b e en proposed (soo 

Doolo & Moonon, 'GSVD bas e d optimal filtering for single and multimicrophono 
sp ee ch enhanc e m e nt", IEEE Trans. Signal Proc e ssing, vol. 50, no. 9, pp. 2230 2211, 
Sep. 2002) that provid e s a Minimum M e an Squar e Error (MMSE) ostimato of the 
desired signal portion in one of the roooivod miorophono signals. In contrast to th e ANC 
of the GSC, the MWF is abl e to talc e sp ee ch distortion into aooount in its optimisation 
crit e rion, r e sulting in th e Sp ee ch Distortion W e ighted Multi chann e l Wi e ner Filter 
(SDW MWF). The (SDW )MWF technique is uniquely based on estimates of the 
s e cond ord e r statistics of th e r e cord e d opeooh signal and th e nois e signal. A robust 
speech d e tection is thus again noodod. In contrast to the GSC, the (SDW )MWF doos 
not malce any a priori assumptions about th e signal model such that no or a l e ss s e vere 
robustn e ss constraint is n ee d e d to guarant ee p e rformanco wh e n us e d in combination 
with small siz e d arrays. Esp e cially in complicat e d noiso sc e narios such as multiple 
nois e sources or diffuse noise, the (SDW )MWF outp e rforms the GSC, ev e n wh e n the 
GSC is suppl e m e nt e d with a robustn e ss constraint. 

[0005] A possible impl e m e ntation of the (SDW )MWF is bas e d on a G e n e ralis e d 

Singular Value Decomposition (GSVD) of on input data matrix and a nois e data matrix. 
A cheaper altomativo based on a QR Decomposition (QRD) has been proposed in 
Rombouts & Mooncn, *QRD based unconstrained optimal filtering for acoustic noise 
reduction \ Signal Processing, vol. 83, no. 9, pp. 1889 1904, Sep. 2003. Additionally, a 
subbond implementation results in improved int e lligibility at a significantly lower coot 
compared to th e fiillband approach. However, in contrast to th e GSC and the QIC GSC, 
no cheap stochastic gradi e nt bas e d implementation of the (SDW )MWF is availabl e y e t. 
In Nordholm et aL, — 'Adaptive microphone array employing calibration signals: an 
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analytical evaluation', IEEE Trans, Speech, Audio Processing, vol 7, no, 3, pp, 241 
252, May 1999, on LMS bas e d algorithm for th e MWF has b ee n d e v e loped. Howovor, 
said — algorithm — needs — r e cordings — of calibration — signals. — Since room — acoustics, 
microphon e charact e ristics and the location of th e desir e d spoalcor ohango over time, 
fr e qu e nt r e calibration is required, malcing this approach cumbersome and expensive. 
Also an LMS based SDW MWF has boon proposed that avoids the need for calibration 
signals (s ee Florcncio & Malvar, 'Multichannel filtering for optimum noise reduction in 
microphone arrays \ Int. Conf on Acoust,, Speech, and Signal Proc, Salt Lake City, 
USA, pp, 197 200, May 2001), This algorithm how e v e r relies on somo indopondonoo 
assumptions that are not n e c e ssarily satisfied, resulting in degraded performo fteer 

{0006] The GSC and MWF t e chniqu e s ore now pr e sent e d more in d e tail. 

Generali s ed Sidclobc Conccller (GSC) 

[0007] Fig. 1 describes the concept of the Gen e ralised Sid e lob e Canoollor 

(GSC), which consists of a fixed, spatial pr e processor, i.e. a fixed b e amform e r ^ " Ife) and 
a blocking matrix B(z}, and an ANC. Given M microphone signals 

uXk]-u'[k] I u"[kl — i-l,...,A/ (equation 1) 

with uj[k] the desir e d sp ee ch contribution and u"[k] th e noise contribution, th e fixed 
b e omformor jife) (e.g. delay and sum) creat e s a so called sp ee ch r e f e r e nc e 

yolk] -yolk] I yo[kl (equation 2) 

by steering a beam towards the direction of th e desired signal, and comprising a speech 
contribution >'o[/(r] and a nois e contribution y^lk]. The blocking matrix B(z) cr e at e s M 
1 so called noiso references 

yXk]-ym ^ 1 (equations) 

by st e ering z e ro e s towards th e dir e ction of th e desir e d signal sourc e such that th e nois e 
contributions are dominant compar e d to th e speech loalcage contributions yj[k]. 

In the sequ e l, th e superscripts s and n ar e used to refer to th e speech and th e nois e 
contribution of a signal. During periods of speech + noiso, th e r e ferences >\[/c], 
I-Q...M 1 contain speech + noiso. During p e riods of noiso only, th e references only 
consist of a nois e component, i.e. y^lk] = y"[k] . The second order statistics of the nois e 
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signal ore ossxim e d to be quit e atationary ouoh that thoy can b e e stimated during poriodQ 
of nois e only. 

[0008j To design tho fix e d^ spatial pr e proc e ssor, assumptiono ar e mado 

about tho microphono charactoriotioo, the spoalcor position and tho miorophono positions 
and furth e rmor e reverb e ration is assum e d to b e abs e nt. If th e s e assumptions are 
satisfied, tho noiso rof e r e ncos do not contain any spooch, i.e., yj[k] - 0, for i~7, M 7. 

How e v e r, in practic e , th e s e assumptions aro often violated (o.g. duo to miorophono 
mismatch and reverberation) such that speech l e aks into th e nois e r e f e rences. To limit 
tho effect of such speech loolcago, tho ANC filter w,^^.^ c c^^'^^^^' 













(equation 4) 



wh e r e 

w. - [w.[0] — TtTfl^ — TT. — ^t^.[£""l]]^, - (equation 5) 

with L th e filter length, is adapt e d during periods of noise only. (Note that in a time 
domain implementation tho input signals of the adaptive filter W4 ^m-4 - and th e filter W4 ^m-4- 
are real. In the sequ e l the formulas ar e g e n e ralised to complex input signals such that 
th e y can also be appli e d to a subband implementation.) Hence, the ANC filt e r W4 ^m^ 
minimis e s the output nois e pow e r, i. e . 

Wi:Ay-i ~argmin£:[|>^^^^[/c A] w^^^_J/c]y^"^^,t[/cf ) (equation 6) 

l e ading to 

-E{ylM-xW,^.^^^^^ A]}, (equation?) 

wh e r e 

y;;:^-»W-[yr^W y^W y%"m~\ (equation 8) 

ym-[ym y;[k-\] ... - L - H]]" (equation 9) 

and wh e re A is a d e lay applied to tho speech reference to allow for non causal taps in 
the filter im^ m ^- Th e delay A is usually s e t to [^], wh e re \x\ denot e s tho small e st 

int e g e r e qual to or larg e r than jc. The subscript 1:M 1 in W 4.^4 ^ and ^i j:^ ^ ref e rs to tho 
subscripts of th e first and th e last channel component of tho adaptive filter and input 
vector, r e sp e ctively. 
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{0009] Under ideal oonditiono (y'[k]^0, i -l,..,,M 1), tho GSC 

minimis e o th e residual nois e while not distorting th e deair e d sp e eoh signal, i.e. 
z^[k]-y^[k — A] . Howev e r, when us e d in combination with small siz e d arrays, a small 
e rror in th e assumed signal mod e l (r e sulting in [fe] ^ 0, i - 1,...,M — h ) alr e ady 
suffic e s to produce a signifioontly distorted output spoooh signal ii^ fk} 

r'[k]^y'^[k A] w,^^.^y;^^.,[fe], (equation 10) 

e v e n wh e n only adapting during nois e only p e riods, such that a robustn e ss constraint on 
w^^M ^ is r e quir e d. In addition, the fix e d b e amformer y'f(lr> should be designed such that 
th e distortion in the speech r e f e r e nc e yl[k^ is minimal for all possible mod e l e rrors. In 

th e sequ e l, — a d e lay and sum b e amform e r is used. — For small sized arrays, this 
b e amformor offers suffici e nt robustnoss against signal mod e l errors, as it minimisos tho 
noise sensitivity. Th e nois e s e nsitivity is defin e d as th e ratio of tho spatially whit e noise 
gain to th e gain of th e desired signal and is often used to quantif>^ th e sensitivity of an 
algorithm against e rrors in th e assum e d signal model. When statistical Imowledgo is 
giv e n about th e signal model e rrors that occur in practic e , th e fix e d b e amform e r and th e 
blocking matrix con b e fiirther optimised, 

[0010] A common approach to incr e as e th e robustn e ss of the GSC is to 

apply a Quadratic In e quality Constraint (QIC) to th e ANC filter hv ^a ^, such that tho 
optimisation criterion (oq.6) of the GSC is modified into 

w,^_.=argmin£:{|>'o"[A:-A]-w^^_,[A:]yi;^_,[A:]f} . 
• — ' ( e quation 11) 

subject to w^^_,w,^^_, < fi^, 

Tho QIC avoids excessive growth of the filter coefRoients H e nc e , it r e duces tho 

und e sired sp ee ch distortion wh e n spe e ch lodes into tho nois e r e f e rences. 
Th e QIC GSC can bo implemented using th e adaptive scaled projection algorithm 
(SPA)_: at each updat e st e p, th e quadratic constraint is appli e d to th e newly obtained 
ANC filt e r by scaling the filter coefficients by when y^xM-x^vM-x o^^coods jg ^T 

Rec e ntly, Tian et al. implemented th e quadratic constraint by using variabl e loading 
{'Recursive least squares implementation for LCMP Bcamforming under quadratic 
constraint', IEEE Tram. Signal Processing, vol. 49, no, 6, pp, 1138 1145, June 2001). 
For R e cursive Least Squares (RLS), this technique provid e s a bett e r approximation to 
tho optimal solution ( e q.l 1) than th e scaled proj e ction algorithm. 
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Multi Chonncl Wiener Filtering 4MWf4 



[0011] Th e Multi ohannol Wionor filt e ring (MWF) toohniquo provides a 

Minimum M e an Squoro Error (MMSE) ootimato of the d e sir e d signal portion in on e of 
tho rocoived microphon e signals. In contrast to the GSC, this filt e ring t e chnique do e s 
not malco any a priori assumptions about tho signal model and is found to b e mor e 
robust. Esp e oially in complex nois e sc e narios such as multipl e nois e sourc e s or diflfuso 
noise, th e MWF outperforms tho GSC, cvon when the GSC is supplied with a 
robustness constraint. 

[0012] Tho MWr wi M g C*^^^' minimises tho Moan Square Error (MSE) 

botvvoon a delayed version of tho (unloiown) speech signal uj[k — A] at tho i th (e.g. 
first) microphone and the sum \iuM^i'MW of th e Af filt e r e d microphon e signals, i. e . 







w = org imn £ j|M;[/c A] w \!m^um W\ j 


; — (equation 12) 



l e ading to 

Wi:^ - i^KM[^]"IL[^]}"'^{"i:A.[^Kl^ - A]}, (equation 13) 

Wbw "[wT — WT (equation H) 

«wW°[urW <W (equation 15) 

n,.[^]-[t<,.[A-] — ffrf* — 1^ — — — ttjf* — L I 1]]^. ( e quation 16) 

wh e r e Ui[k] comprise a speech component and a nois e component. 

[0013] An equival e nt approach consists in estimating a d e lay e d v e rsion of 

th e (unlcnown) noise signal u"[k — A] in tho i th microphon e , r e sulting in 

W|:A, ^argmmgj l ^^^^ A] w^^u,,^[/c] | 'j , (equation 17) 

and 

^'.:Af -^(".:a.W".^W}"'^{".:mW";''[^^ A]] , (oquatioii 1 8 ) 

wh^e 

w^. (oquation 19) 
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Th e e stimate 2[k] of th e sp ee ch componont uj[k — A] io then obtain e d by subtracting the 
ootimato W|^^U|.^[/t:] of u"[k — A] from tho delayed, / th miorophono oignal u.[k — 
irer 

s[k] = uXk — Ai] — wf^u^.^ [k]. ' (equation 20) 

This is depicted in Fig. 2 for u" [k A] u" [k A ^ 

fOOM] Tho roaidual error energy of tho MWF equals 

E{ \ e[k] p} - E{\uJ[k ' A] - (equation 21) 

and can b e dooompoood into 

E{\u^[k - A] - w,>f.v [^] | '} + E{\yy^,^u:.^ [kf} (oquation22) 

V V ' N ' 

whore equals th e sp ee ch distortion energy and fi^ tho residual nois e e n e rgy. Th e 

d e sign critorion of tho MWF can b e g e neralised to allow for a trade off b e tween spooch 
distortion and nois e r e duction, by incorporating a weighting factor ju with ju c [Q,qo] 

wi:A^-argmm£'{ | <[/c A] w^Wa^ W|Y + /^^(|w^^^^^^ (equation 23) 

Th e solution of (eq.23) is giv e n by 

m:M-I^KMWKM[k] I MKMlk]nl:;^[k]}-'E[nl^^^^^ A]], (equation 21) 

[0015] Equivalently, th e optimisation criterion for hV tA ^ in ( e q.l7) can b e 

modified into 

w,^-argmm£-{ | w^^ A] w^^u;;^[/r]|'], (equation 25) 

r e sulting in 

w,^ -^K:^[/c]u;;^[/c] I ^ <^[/cK^[/c]]-^£K:A/[/0<a A]). (equation26) 

M 

In th e s e quel, (eq.26) will b e r e f e rred to as th e Speech Distortion Weighted Multi 
channel Wiener Filter (SDW MWF). 

The factor //c[0,oo] trades off sp ee eh distortion versus noise reduction. If /<~1, the 
MMSE crit e rion ( e q.l2) or ( e q.l7) is obtain e d. If ju>l, the r e sidual noise level will bo 
reduced at tho e xp e ns e of increased speech distortion. By s e tting ju to qo, all emphasis is 
put on nois e reduction and sp e ech distortion is completely ignored. Setting to 0 on the 
other hand, results in no noise reduction. 
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[0016] In praotic e , tho correlation matrix E[ulj^[k]ul:^[k]] is unlcnovvn. 

Dxiring p e riods of opoooh, the inputs — tt^ik] — oonsiat of sp e ech + nois e , i.e., 
Uf[k] = uf[k] I u"[k]J = \,..,,M . Dimng periods of nois e , only th e nois e oomponont 
u"[k] is obs e rved. Aosuming that the speech signal and th e nois e signal ore 
uncorrolatod, E[ulj^[k]u\:^[k]] can bo estimated as 

i^KA/W^^Wj-^glu^^^muf^MW) irKA.W<:^W]> (oquation27) 

where tho second order statistics £'[u,.^[fe]u,^^[/c:]j are estimated during speech + noise 

and th e s e cond ord e r statistics £'[u".j^^[fe]u]!^[A:]) during p e riods of nois e only. As for 

the GSC, a robust sp e ech det e ction is thus needed. Using ( e q.27), (eq.21) and (eq.26) 
can b e r e writt e n as: 



f>i^^ mmm 



x(E{u^^[k]u;[k-A]}-E{a';,^[k]ur[k-A]}) 

( e quation 2 8 ) 

........ .. 1 



M 

(equation 29) 

Th e Wien e r filter may b e comput e d at e ach time instant - i r by m e ans of a Generalised 
Singular Valu e Decomposition (GSVD) of a speech + noise and noise data matrix. A 
cheaper recursive altemative based on a QR d e composition is — atee — availabl e . 
Additionally, a subband impl e m e ntation incr e ases the r e sulting sp ee ch int e lligibility and 
r e duc e s compl e xity, malcing it suitable for h e aring aid applications. 

Aim s of the invention 



[0017] Th e present inv e ntion aims to provide a method and device for adaptiv e ly 

r e ducing — the — nois e , — e specially — the — background — nois e , — in — sp ee ch — enhancem e nt 
applications, thereby overcoming tho problems and drawbacks of the state of the art 
solutions. 



10 



Attv. Docket No. COCH-01 85-US1 /Customer No. 2 2 . 5 0 6 Client Ref . No. CID 31 1 US 
SUMMARY Stt inmarv of the inven ttoft 

[0010] In one aspect of the ¥ he-present invention^ r e lat e s to a method to reduce noise 
in a noisy speech signal is disclosed? The method comprises oomprisin i g tho stops of 

applying at least two versions of the noisy speech signal to a first filter, whereby that 
first filter outputs a speech reference signal and at least one noise reference signal, 
appl5dng a filtering operation to each of the at least one noise reference signals, and 
subtracting from the speech reference signal each of the filtered noise reference signals, 
charactoriood in that wherein the filtering operation is performed with filters having 
filter coefficients determined by taking into account speech leakage contributions in the 
at least one noise reference signal. 

[0019] In a t>pical embodiment tho at least two versions of the noisy sp ee oh signal 

or e signals fi-om at least two microphon e s pioldng up th e noisy speech signal. 

[0020] Pr e ferably th e first filter is a spatial pro processor filt e r, comprising 

a boamformor filter and a blocking matrix filter. 

[0021] In an advantageous e mbodim e nt the speeoh r e f e r e nc e signal is 

output by th e b e omform e r filter and th e at l e ast one noise refer e nc e signal is output by 
the blocking matrix filter. 

[0022] In a pr e f e rr e d embodim e nt th e sp ee oh r e f e r e nc e signal is d e layed 

b e for e p e rforming th e subtraction step. 

[0023] Advantag e ously a filt e ring op e ration is additionally appli e d to th e 

speech r e f e renc e signal, where the filt e r e d sp ee ch r e fer e nce signal is also subtracted 
fi'om th e speeoh r e f e r e nc e signal. 

[002 4 ] In another pr e f e rr e d e mbodim e nt the method furth e r comprises tho 

st^ — of regularly adapting th e filt e r coefficients. — Th e r e by th e speech loalcago 
contributions in th e at least on e nois e r e fer e nc e signal ar e tak e n into account or, 
altematively, both the speech loalcag e contributions in th e at least on e nois e r e ference 
signal and the speech contribution in th e speech r e f e r e nce signal. 

. — The invention also relates to th e us e of a method to r e duc e nois e as described 
pr e viously in a sp ee oh enhano e m e nt application. 
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[0011] In another aspect of a s e oond obieot t he invention also relates t o a signal 
processing circuit for reducing noise in a noisy speech signal, comprising is enclosed. 
This 

• a first filter having at least two inputs and arranged for outputting a speech reference 
signal and at least one noise reference signal, 

• a filter to apply the speech reference signal to and filters to apply each of the at least 
one noise reference signals to, and 

• summation means for subtracting fi*om the speech reference signal the filtered 
speech reference signal and each of the filtered noise reference signals. 

[0027] Advantageously, the first filter is a spatial pro proc e ssor filt e r, 

comprising a boamformor filter and a blocking matrix filter. 

[0028] In an altemativ e e mbodim e nt the b e amformor filter is a delay and 

sum boamformer. 

[0029] Th e inv e ntion also r e lates to a hearing d e vice comprising a signal 

proc e ssing circuit as described. By hearing device is meant an acoustical hearing aid 
( e ith e r ext e rnal or implantabl e ) or a cochl e ar implant. 



Short description of the drawing sBRIEF DESCRIPTION OF THE DRAWINGS 

[0012] Fig. 1 represents the concept of the Generalised Sidelobe Canceller in 
accordance with one embodiment of the present invention . 

[0013] Fig. 2 represents an equivalent approach of multi-channel Wiener filterin g in 
accordance with one embodiment of the present invention . 

[0014] Fig. 3 represents a Spatially Pre-processed SDW-MWF in accordance with 
one embodiment of the present invention . 

[0015] Fig. 4 represents the decomposition of SP-SDW-MWF with wo in a multi- 
channel filter Wd and single-channel postfilter ei-w n in accordance with one embodiment 
of the present invention . 
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[0016] Fig. 5 represents the set-up for the experiments in accordance with one 
embodiment of the present invention . 

[00171 Fig- 6 represents the influence of on the performance of the SDR GSC for 
different gain mismatches T2 at the second microphon e in accordance with one 
embodiment of the present invention . 

[0018] Fig. 7 represents the influence of \lfi on the performemce of the SP-SDW- 

MWF with Wo for different gain mismatches T2 at the second microphon e in 
accordance with one embodiment of the present invention . 

[0019] Fig. 8 represents the ASNRinteiug and SDinteiiig for QIC-GSC as a function of f}^ 
for different gain mismatches Y2 at the second microphon e in accordance with one 
embodiment of the present invention . 

[0020] Fig. 9 represents the complexity of TD and FD Stochastic Gradient (SG) 
algorithm with LP filter as a function of filter length L per channel; M=3 (for 
comparison, the complexity of the standard NLMS ANC and SPA are depicted too) in 
accordance with one embodiment of the present invention , 

[0021] Fig. 10 represents the performance of different FD Stochastic Gradient (FD- 
SG) algorithms; (a) Stationary speech-like noise at 90^; (b) Multi-talker babble noise at 
90° in accordance with one embodiment of the present invention . 

[0022] Fig. 11 represents the influence of the LP filter on performance of FD 
stochastic gradient SP-SDW-MWF (l///=0.5) without wq and with wq. Babble noise at 
90° in accordance with one embodiment of the present invention . 

[0023] Fig. 12 represents the convergence behaviour of FD-SG for A=0 and 
X=0.9998, The noise source position suddenly changes fi"om 90° to 180° and vice versa 
in accordance with one embodiment of the present invention . 

[0024] Fig. 13 represents the performance of FD stochastic gradient implementation 
of SP-SDW-MWF with LP filter (X=0,9998) in a multiple noise source scenario in 
accordance with one embodiment of the present invention . 
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[0025] Fig. 14 represents the performance of FD SPA in a multiple noise source 
scenari o in accordance with one embodiment of the present invention . 

[0026] Fig. 15 represents the SNR improvement of the frequency-domain SP-SDW- 
MWF (Algorithm 2 and Algorithm 4) in a multiple noise source scenario in accordance 

with one embodiment of the present invention . 

[0027] Fig. 16 represents the speech distortion of the frequency-domain SP-SDW- 
MWF (Algorithm 2 and Algorithm 4) in a multiple noise source scenario in accordance 
with one embodiment of the present invention . 

Detailed dc s eription of the inventionP ET AILED DESCRIPTION 

[00281 In speech communication applications, such as teleconferencing, hands-free 
telephony and hearing aids, the presence of background noise mav significantlv reduce 
the intelligibility of the desired speech signal. Hence, the use of a noise reduction 
algorithm is necessary. Multi-microphone systems exploit spatial information in 
addition to temporal and spectral information of the desired signal and noise signal and 
are thus preferred to single microphone procedures. Because of aesthetic reasons, multi- 
microphone techniques for e.g., hearing aid applications go together with the use of 
small-sized arrays. Considerable noise reduction can be achieved with such arrays, but 
at the expense of an increased sensitivity to errors in the assumed signal model such as 
microphone mismatch, reverberation, ... (see e.g. Stadler & Rabinowitz. 'On the 
potential of fixed arrays for hearing aids*. J. AcousL Soc. Amer.. vol. 94, no. 3. p p. 
1332-1342, Sep. 1993^ In hearing aids, microphones are rarely matched in gain and 
phase. Gain and phase differences between microphone characteristics can amount up to 
6 dB and 10°, respectively. 

[0029] A widely studied multi-channel adaptive noise reduction algorithm is the 

Generalised Sidelobe Canceller (GSC) (see e.g. Griffiths & Jim. 'An alternative 
approach to linearly constrained adaptive beamformins', IEEE Trans. Antennas 
Propas.. vol. 30. no. 1, dp. 27-34. Jan. 1982 and US5473701 'Adaptive microphone 
array % The GSC consists of a fixed, spatial pre-processor, which includes a fixed 
beamformer and a blocking matrix, and an adaptive stage based on an Adaptive Noise 
Canceller (ANC). The ANC m nimiGeG minimizes the output noise power while the 
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blocking matrix should avoid speech leakage into the noise references. The standard 
GSC assumes the desired speaker location, the microphone characteristics and positions 
to be known, and reflections of the speech signal to be absent. If these assumptions are 
fixlfilled, it provides an undistorted enhanced speech signal with minimum residual 
noise. However, in realitv these assumptions are often violated, resulting in so-called 
speech leakage and hence speech distortion. To limit speech distortion, the ANC is 
typically adapted during periods of noise only. When used in combination with small- 
sized arrays, e.g., in hearing aid applications, an additional robustness constraint (see 
Cox et aL. 'Robust adaptive beamformin^'. IEEE Trans. Acoust. Speech and Sisnal 
Processins^\ vol 35. no. 10, vv, 1365-1376, Oct 1987) is required to guarantee 
performance in the presence of small errors in the assumed signal model, such as 
microphone mismatch. A widely applied method consists of imposing a Quadratic 
Inequality Constraint to the ANC (OIC-GSC). For Least Mean Squares (LMS) 
updating, the Scaled Projection Algorithm (SPA) is a simple and effective technique 
that imposes this constraint. However, using the QIC-GSC goes at the expense of less 
noise reduction. 

[00301 A Multi-channel Wiener Filtering (MWF) technique has been proposed 

(see Doclo & Moonen, 'GSVD-based optimal filtering for single and multimicrophone 
speech enhancement', IEEE Trans. Signal Processing, vol. 50, no. 9, pp. 2230-2244, 
Sep. 2002) that provides a Minimum Mean Square Error (MMSE) estimate of the 
desired signal portion in one of the received microphone signals. In contrast to the ANC 
of the GSC, the MWF is able to take speech distortion into account in its optimisation 
criterion, resulting in the Speech Distortion Weighted Multi-channel Wiener Filter 
(SDW-MWF). The (SDW-)MWF technique is uniquely based on estimates of the 
second order statistics of the recorded speech signal and the noise signal. A robust 
speech detection is thus again needed. In contrast to the GSC, the (SDW-)MWF does 
not make any a priori assumptions about the signal model such that no or a less severe 
robustness constraint is needed to guarantee performance when used in combination 
with small-sized arrays. Especially in complicated noise scenarios such as multiple 
noise sources or diffuse noise, the (SDW-)MWF outperforms the GSC, even when the 
GSC is supplemented with a robustness constraint. 
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[00311 A possible implementation of the (SDW-)MWF is based on a 

Generalised Singular Value Decomposition (GSVD) of an input data matrix and a noise 
data matrix. A cheaper alternative based on a OR Decomposition (ORD) has been 
proposed in Rombouts & Moonen, 'ORD-based unconstrained optimal filtering for 
acoustic noise reduction'. Sis^nal Processins, vol. 83, no, 9. pp. J 889-1 904, Sep, 2003. 
Additionally, a subband implementation results in improved intelligibility at a 
significantly lower cost compared to the fiiUband approach. However, in contrast to the 
GSC and the OIC-GSC, no cheap stochastic gradient based implementation of the 
(SDW-)MWF is available vet. In Nordholm et at., 'Adaptive microphone array 
em plovins: calibration signals: an analytical evaluation \ IEEE Trans. Speech, Audio 
Process iniz. vol. 7. no. 3, pp. 241-252, May 1999. an LMS based algorithm for the 
MWF has been developed. However, said algorithm needs recordings of calibration 
signals. Since room acoustics, microphone characteristics and the location of the desired 
speaker change over time, frequent re-calibration is required, making this approach 
cumbersome and expensive. Also an LMS based SDW-MWF has been proposed that 
avoids the need for calibration signals (see Florencio & Malvar, 'Multichannel filtering 
for optimum noise reduction in microphone arrays'. Int. Conf, on Acoust., Speech, and 
Signal Proc. Salt Lake City. USA, pp. 197-200, May 2001). This algorithm however 
relies on some independence assumptions that are not necessarily satisfied, resulting in 
degraded performance. 

[00321 The GSC and MWF techniques are now presented more in detail. 
Generalized Sidelobe Canceller (GSO 

[00331 Fig. 1 describes the concept of the Generalized Sidelobe Canceller (GSC), 
which consists of a fixed, spatial pre-processor, i.e. a fixed beamformer Afz) and a 
blocking matrix Bfz). and an ANC. Given M microphone signals 

uXk] = u^[k] + u^[kl i = 1 M (equation 30) 

with w/ [k] the desired speech contribution and u" [k] the noise contribution, the fixed 
beamformer y4(z) (e.g. delav-and-sum) creates a so-called speech reference 

yoW = ym + yoikl (equation 3 1} 
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by steering a beam towards the direction of the desired signal, and comprising a speech 
contribution [k] and a noise contribution [k] . The blocking matrix BCz) creates Af- 

1 so-called noise references 

y^k] = y-W-^y'/lkl i = l Af-l (equation 32^ 

by steering zeroes towards the direction of the desired signal source such that the noise 
contributions y" [k] are dominant compared to the speech leakage contributions y^ [k] . 

In the sequeU the superscripts s and n are used to refer to the speech and the noise 
contribution of a signal. During periods of speech -t- noise, the references y. [k] ^ 

i=0..,M'I contain speech + noise. During periods of noise only, the references only 
consist of a noise component, i.e. Vj\k] = v•^k^ . The second order statistics of the noise 

signal are assumed to be quite stationary such that they can be estimated during periods 
of noise only. 

[0034] To design the fixed, spatial pre-processon assumptions are made about the 
microphone characteristics, the speaker position and the microphone positions and 
furthermore reverberation is assumed to be absent. If these assumptions arc satisfied, 
the noise references do not contain any speech, i.e., v^\k^ = 0, for i=l M-L 

However, in practice, these assumptions are often violated (e.g. due to microphone 
mismatch and reverberation) such that speech leaks into the noise references. To limit 
the effect of such speech leakage, the ANC filter yv^^^f ^ e c^^"*^^^^ 

'^iM-i=[^i "^2 wj^-i] (equation 33} 

where 

yv.=[w.[0] w.[l] ... w^lL-U]\ (equation 34) 

with L the filter length, is adapted during periods of noise only. (Note that in a time- 
domain implementation the input signals of the adaptive filter wi-m./ and the filter WfM.t 
are real. In the sequel the formulas are generalised to complex input signals such that 
they can also be applied to a subband implementation.) Hence, the ANC filter wiM.t 
minimises the output noise power, i.e. 

w,:A/-i = arg min E{\y^[k - A] - w^^^.^[/:]y;;^.,[A:]|'} (equation 35} 
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leading to 

w,:;>,-.=^{y;:^,-iWy;;;^-.W}"'^{y;':./-.W.yo '[^-A]}, (equation 361 

where 

y;':l.W = [y;'"[^] y2"[^] - y"f-m] (equations?} 

y;'[^] = [>';'[^] y':[k-\] ... v"\k-L + uJ (equation 38) 

and where A is a delay applied to the speech reference to allow for non-causal taps in 
the filter w/m.i. The delay A is usually set to [y] , where [jc] denotes the smallest 

integer equal to or larger than x. The subscript LM-l in wj ^.t and vi ^.i refers to the 
subscripts of the first and the last channel component of the adaptive filter and input 
vector, respectively. 

[0035] Under ideal conditions ( v'\k] = 0. i^l Af-1 ). the GSC minimises the 

residual noise while not distorting the desired speech signal, i.e. z^^] = ^^^[A: - A] . 

However, when used in combination with small-sized arrays, a small error in the 
assumed signal model (resulting in v'\k} ^ 0. i = \ M -1 ) already suffices to produce 

a significantly distorted output speech signal /fkl 

= yplk - A] - y^vM-iyuM-ilkl (equation 39} 

even when only adapting during noise-only periods, such that a robustness constraint on 
w/ M.f is required. In addition, the fixed beamformer ACz) should be designed such that 
the distortion in the speech reference [k] is minimal for all possible model errors. In 

the sequel, a delay-and-sum beamformer is used. For small-sized arrays, this 
beamformer offers sufficient robustness against signal model errors, as it minimises the 
noise sensitivity. The noise sensitivity is defined as the ratio of the spatially white noise 
gain to the gain of the desired signal and is ofl:en used to quantify the sensitivity of an 
algorithm against errors in the assumed signal model. When statistical knowledge is 
given about the signal model errors that occur in practice, the fixed beamformer and the 
blocking matrix can be fiirther optimised. 
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[0036] A common approach to increase the robustness of the GSC is to apply a 
Quadratic Inequality Constraint (QIC) to the ANC filter wj m.u such that the 
optimisation criterion (eg.) of the GSC is modified into 

"^'A/i (equation 4 0) 

subject to w^^,,w,^^.^ <>g^ 

The QIC avoids excessive growth of the filter coefficients wi m.i. Hence, it reduces the 
undesired speech distortion when speech leaks into the noise references. 
The OIC-GSC can be implemented using the adaptive scaled projection algorithm 
(SPA) : at each update step, the quadratic constraint is applied to the newly obtained 
ANC filter by scaling the filter coefficients by p^-j when w^^^ {w^^^ , exceeds B^. 

Recently, Tian et al. implemented the quadratic constraint bv using variable loading 
('Recursive least squares implementation for LCMP Beamformins^ under quadratic 
constraint', IEEE Trans. Signal Processing, vol 49, no, 6, dp. 1138-1145. June 2001\ 
For Recursive Least Squares (RLS), this technique provides a better approximation to 
the optimal solution (eq.) than the scaled projection algorithm. 



Multi-Channel Wiener FUtering f MWF) 



[0037] The Multi-channel Wiener filtering (MWF) technique provides a Minimum 
Mean Square Error (MMSE) estimate of the desired signal portion in one of the 
received microphone signals. In contrast to the GSC. this filtering technique does not 
make any a priori assumptions about the signal model and is found to be more robust. 
Especially in complex noise scenarios such as multiple noise sources or diffuse noise, 
the MWF outperforms the GSC. even when the GSC is supplied with a robustness 
constraint. 

[0038] The MWF wi:a/ e C^^"* minimises the Mean Square Error (MSE) between a 
delayed version of the (unknown) speech signal m f [k - A] at the /-th (e.g. first) 
microphone and the sum ^"mIi^m [^1 of the M filtered microphone signals, i.e. 
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wi:A/ = arg min E ||wf - A] - ysvM^xM } > (equation 41) 

leading to 

yivM^E{u,.^^[k]n'!,^^^^^^ (equation 42} 

with 

WhA/=[wf' ^2 wjS], (equation 4 3) 

u^^[k] = [u^[k] u^[k] ... u^lk]]. (equation 4 4) 

= [u,[k] u^k - 1] • • . uXk - L + 11]^ (equation 4 5) 

where Uifkl comprise a speech component and a noise component. 

[0039] An equivalent approach consists in estimating a delayed version of the 
(unknown) noise signal u" [k - A] in the /-th microphone, resulting in 

w,^^ = arg min E [/: - A] - w,^^u,^^ , (equation 4 6) 

and 

^vM = E{u,M [k^M £{^v.M [kyrik - A]}, (equation 47} 

where 

^vM = ^2 w ] . (equation 4 8) 

The estimate zfkl of the speech component uj [k - A] is then obtained by subtracting the 
estimate, w^^u,.^ [k] of u"\k - A] from the delayed, i-th microphone signal u.\k - A]^ 
i.e. 

z[k] = uXk - A] - ^"j^n,.M [kl (equation 49) 

This is depicted in Fig. 2 for u^\k - Al = u"\k - A ]. 

[0040] The residual error energy of the MWF equals 

E{\ e[k} t } = E^u^ik - A] - w,>,^ [kf). (equation 50) 
and can be decomposed into 
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g{|«;[^-A]-w.>;:^[A:]f} + [kf} (equation s 1} 

^ V ' ^ — • v— ■ ' - ' " ^ 

^ si 

where gj equals the speech distortion energy and the residual noise energy. The 

design criterion of the MWF can be generalised to allow for a trade-off between speech 
distortion and noise reduction, by incorporating a weighting factor u with a e [0,oo] 

w,^ = arg mm E{\u^ [k-A]- yvuM^M [^f} + //^{| w^^t^u^^^ [k]\\ (equation 52) 

The solution of (eg.) is given by 

Hv,,^ = E{ul,,[k]nl:^[k^^ (equation 53) 

[0041] E quiyalently, the optimisation criterion for wim.i in (eq.) can be modified 
into 

w,^ = arg nun ^{| w,^^<^ [kf} + - A] - w,^^ u;!^ [k]\\ (equation 54} 

resulting in 

^vM=E{KM[kK:;:;[k]-^-^^^^^ 

M 

In the scquet (eq.) will be referred to as the Speech Distortion Weighted Multi-channel 
Wiener Filter (SDW-MWFV 

The factor ^ e rO^Qol trades off speech distortion versus noise reduction. If /u=\. the 

MMSE criterion (eq.) or (eq.) is obtained. If ^>1, the residual noise level will be 
reduced at the expense of increased speech distortion. By setting u to oo. all emphasis is 
put on noise reduction and speech distortion is completely ignored. Setting z/ to 0 on the 
other hand, results in no noise reduction. 

[0042] In practice, the correlation matrix Eiul^^ t^l } is unknown. During 

periods of speech, the inputs u. [k] consist of speech + noise, i.e.. 
u.[k] = uJ[k] + u"[k]J = 1,,..,M . During periods of noise, only the noise component 
u" [k] is observed. Assuming that the speech signal and the noise signal are 
uncorrelated, Eiul^^f W } can be estimated as 

E{ul^[k]ul:^[k]} = E{u,^^^^ (equation 56} 
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where the second order statistics Ein^.^^ W^um rA:U are estimated during speech + noise 
and the second order statistics Eiu^.j^ W^vmW} during periods of noise only. As for 

the GSC. a robust speech detection is thus needed. Using (eg.), (eg.) and (eg.) can be re- 
written as: 

w,:*, = (£:{u,«[A:K«[^]}+(/i-i)£:{u;:^Mu;':;^^M})"' 

x(^{",A, [k]u;[k - A]} - E{ul^ [k]ur[k - A] }) 

(equation 57) 

and w,^ = (^-^ E{u,^ [k]u^„ [k] } + (1 - ^)E{n';,^ [A:]u;;;^ [A:]} j e{uI^ [k]ur[k - A]} . 

(eguation 58) 

The Wiener filter may be computed at each time instant k by means of a Generalised 
Singular Value Decomposition (GSVD) of a speech + noise and noise data matrix. A 
cheaper recursive altemative based on a OR-decomposition is also available. 
Additionally, a subband implementation increases the resulting speech intelligibility and 
reduces complexity, making it suitable for hearing aid applications. 

[0043] The present invention is now described in detail. First, the proposed adaptive 
multi-channel noise reduction technique, referred to as Spatially Pre-processed Speech 
Distortion Weighted Multi-channel Wiener filter, is described. 

[0044] A first aspect of the invention is referred to as Speech Distortion Regularised 
GSC (SDR-GSC). A new design criterion is developed for the adaptive stage of the 
GSC: the ANC design criterion is supplemented with a regularisation term that limits 
speech distortion due to signal model errors. In the SDR-GSC, a parameter // is 
incorporated that allows for a trade-off between speech distortion and noise reduction. 
Focussing all attention towards noise reduction, results in the standard GSC, while, on 
the other hand, focussing all attention towards speech distortion results in the output of 
the fixed beamformer. In noise scenarios with low SNR, adaptivity in the SDR-GSC 
can be easily reduced or excluded by increasing attention towards speech distortion, i.e., 
by decreasing the parameter /i to 0. The SDR-GSC is an altemative to the QIC-GSC to 
decrease the sensitivity of the GSC to signal model errors such as microphone 
mismatch, reverberation,... In contrast to the QIC-GSC, the SDR-GSC shifts emphasis 
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towards speech distortion when the amount of speech leakage grows. In the absence of 
signal model errors, the performance of the GSC is preserved. As a result, a better noise 
reduction performance is obtained for small model errors, while guaranteeing 
robustness against large model errors. 

[0045] In a next step, the noise reduction performance of the SDR-GSC is further 
improved by adding an extra adaptive filtering operation wq on the speech reference 
signal. This generalised scheme is referred to as Spatially Pre-processed Speech 
Distortion Weighted Multi-channel Wiener Filter (SP-SDW-MWF). The SP-SDW- 
MWF is depicted in Fig. 3 and encompasses the MWF as a special case. Again, a 
parameter // is incorporated in the design criterion to allow for a trade-off between 
speech distortion and noise reduction. Focussing all attention towards speech distortion, 
results in the output of the fixed beamformer. Also here, adaptivity can be easily 
reduced or excluded by decreasing fi to 0. It is shown that -in the absence of speech 
leakage and for infinitely long filter lengths- the SP-SDW-MWF corresponds to a 
cascade of a SDR-GSC with a Speech Distortion Weighted Single-channel Wiener filter 
(SDW-SWF). In the presence of speech leakage, the SP-SDW-MWF with wo tries to 
preserve its performance: the SP-SDW-N4WF then contains extra filtering operations 
that compensate for the performance degradation due to speech leakage. Hence, in 
contrast to the SDR-GSC (and thus also the GSC), performance does not degrade due to 
microphone mismatch. Recursive implementations of the (SDW-)MWF exist that are 
based on a GSVD or QR decomposition. Additionally, a subband implementation 
results in improved intelligibility at a significantly lower complexity compared to the 
fiiUband approach. These techniques can be extended to implement the SDR-GSC and, 
more generally, the SP-SDW-MWF. 

[0046] In this invention, cheap time-domain and frequency-domain stochastic 
gradient implementations of the SDR-GSC and the SP-SDW-MWF are proposed as 
well. Starting fi-om the design criterion of the SDR-GSC, or more generally, the SP- 
SDW-MWF, a time-domain stochastic gradient algorithm is derived. To increase the 
convergence speed and reduce the computational complexity, the algorithm is 
implemented in the fi-equency-domain. To reduce the large excess error fi-om which the 
stochastic gradient algorithm suffers when used in highly non-stationary noise, a low 
pass filter is applied to the part of the gradient estimate that limits speech distortion. The 
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low pass filter avoids a highly time-varying distortion of the desired speech component 
while not degrading the tracking performance needed in time-varying noise scenarios. 
Experimental results show that the low pass filter significantly improves the 
performance of the stochastic gradient algorithm and does not compromise the tracking 
of changes in the noise scenario. In addition, experiments demonstrate that the proposed 
stochastic gradient algorithm preserves the benefit of the SP-SDW-MWF over the QIC- 
GSC, while its computational complexity is comparable to the NLMS based scaled 
projection algorithm for implementing the QIC. The stochastic gradient algorithm with 
low pass filter however requires data buffers, which results in a large memory cost. The 
memory cost can be decreased by approximating the regularisation term in the 
fi-equency-domain using (diagonal) correlation matrices, making an implementation of 
the SP-SDW-MWF in commercial hearing aids feasible both in terms of complexity as 
well as memory cost. Experimental results show that the stochastic gradient algorithm 
using correlation matrices has the same performance as the stochastic gradient 
algorithm with low pass filter. 

Spariallv pre-processed SDW Multi-channel Wiener Filter 
Concept 

[0047] Fig. 3 depicts the Spatially pre-processed. Speech Distortion Weighted Multi- 
channel Wiener filter (SP-SDW-MWF). The SP-SDW-MWF consists of a fixed, spatial 
pre-processor, i.e. a fixed beamformer A(z) and a blocking matrix B(z)^ and an adaptive 
Speech Distortion Weighted Multi-channel Wiener filter (SDW-MWF). Given M 
microphone signals 

M.[/r] = w;[/r] + w;[^],/ = l,...,M (equation 59) 

with u'[k] the desired speech contribution and m"[^] the noise contribution, the fixed 
beamformer yl^z) creates a so-called speech reference 

3^o[^] = ylW + yl[k\ (equation 60) 

by steering a beam towards the direction of the desired signal, and comprising a speech 
contribution yl[k] and a noise contribution jv^qL^]- To preserve the robustness 
advantage of the MWF, the fixed beamformer A(z) should be designed such that the 
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distortion in the speech reference yQ[k] is minimal for all possible errors in the 

assumed signal model such as microphone mismatch. In the sequel, a delay-and-sum 
beamformer is used. For small-sized arrays, this beamformer offers sufficient 
robustness against signal model errors as it minimises the noise sensitivity. Given 
statistical knowledge about the signal model errors that occur in practice, a fiirther 
optimised filter-and-sum beamformer A(z) can be designed. The blocking matrix B(z) 
creates M-7 so-called noise references 

y,W = y-[k] + i = -l (equation 61) 

by steering zeroes towards the direction of interest such that the noise contributions 
y"[k] are dominant compared to the speech leakage contributions y^[k]. A simple 

technique to create the noise references consists of pairwise subtracting the time-aligned 
microphone signals. Further optimised noise references can be created, e.g. by 
minimising speech leakage for a specified angular region around the direction of 
interest instead of for the direction of interest only (e.g. for an angular region fi-om -20® 
to 20° around the direction of interest). In addition, given statistical knowledge about 
the signal model errors that occur in practice, speech leakage can be minimised for all 
possible signal model errors. 



[0048] In the sequel, the superscripts s and n are used to refer to the speech and the 
noise contribution of a signal. During periods of speech -I- noise, the references y^k], 
i = 0,...,M -1 contain speech + noise. During periods of noise only, i=0, ,.,,M-1 

only consist of a noise component, i.e. y-[k] = y-[k] . The second order statistics of the 

noise signal are assumed to be quite stationary such that they can be estimated during 
periods of noise only. 



[0049] The SDW-MWF filter wq.m- 



1 



-^{yo:A/-,[^K;^-.W}+^{yaA.-.[^Ki^-.[^]} 



(equation 62) 

with 

<M-x\k'\ = \_<lk-\ wf[A:] ... yil_\k\\, (equation 63) 
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w.[A:] = [vi;.[0] vi^.[l] ... w.[L-l]Y (equation 64) 
yoM-m = [y^[k] yfm ... y'^_,[k]], (equation 65) 

y,M = b,W ^,[^-1] - ^'.■[Ar-iL + llf, (equation 66) 
provides an estimate w"j^_,yj,.^_, [A^] of the noise contribution yo[k-A] in the speech 
reference by minimising the cost function Jiwo.M-i) 



(equation 67) 

The subscript 0:M-1 in wo:m.i and yo:M-i refers to the subscripts of the first and the last 
channel component of the adaptive filter and the input vector, respectively. The term e] 

represents the speech distortion energy and si the residual noise energy. The term jjs] 

in the cost function (eq.67) limits the possible amount of speech distortion at the output 
of the SP-SDW-MWF. Hence, the SP-SDW-MWF adds robustness against signal model 
errors to the GSC by taking speech distortion explicitly into accoimt in the design 
criterion of the adaptive stage. The parameter € [0,oo) trades off noise reduction and 

speech distortion: the larger II fi, the smaller the amount of possible speech distortion. 
For /i=0, the output of the fixed beamformer A(z)^ delayed by A samples is obtained. 
Adaptivity can be easily reduced or excluded in the SP-SDW-MWF by decreasing // to 
0 (e.g., in noise scenarios with very low signal-to-noise Ratio (SNR), e.g., -10 dB, a 
fixed beamformer may be preferred.) Additionally, adaptivity can be limited by 
applying a QIC to w^a/-/- 

[0050] Note that when the fixed beamformer A(z) and the blocking matrix B(z) are 
set to 



A(z) = [l 0 ... of 



B(z) = 



0 1 

0 



0 1 0 



H 



(equation 68) 



(equation 69) 



0 0 0 1 

one obtains the original SDW-MWF that operates on the received microphone signals 
/ = 1,...,M. 
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[0051] Below, the different parameter settings of the SP-SDW-MWF are discussed. 
Depending on the setting of the parameter ^ and the presence or the absence of the filter 
Wo, the GSC, the (SDW-)MWF as well as in-between solutions such as the Speech 
Distortion Regularised GSC (SDR-GSC) are obtained. One distinguishes between two 
cases, i.e. the case where no filter wq is applied to the speech reference (filter length 
L(r=0) and the case where an additional filter wo is used (Lo^O). 

SDR-GSC. i.e.> SP-SDW-MWF without wn 

[0052] First, consider the case without wq, i.e. Lo=0. The solution for w,.jj^_, in 
(eq.62) then reduces to 

argminl£{|w^;,.,y[^^.,[^]|V^{|;^o[^- A]- w,^^ (equation 70) 

leading to 



(equation 71) 

where f j is the speech distortion energy and the residual noise energy. 

[0053] Compared to the optimisation criterion (eq.) of the GSC, a regularisation term 

- ^{| w^,,.,yf:^,,[^]|'} (equation 72) 

has been added. This regularisation term limits the amount of speech distortion that is 
caused by the filter w/.a/./ when speech leaks into the noise references, i.e. 
3;;[A:]9£0, / = 1,...,M-1. In the sequel, the SP-SDW-MWF with Lq^O is therefore 

referred to as the Speech Distortion Regularized GSC (SDR-GSC). The smaller ju, the 
smaller the resulting amount of speech distortion will be. For /^=0, all emphasis is put 
on speech distortion such that z[k] is equal to the output of the fixed beamformer A(z) 
delayed by A samples. For //=oo all emphasis is put on noise reduction and speech 
distortion is not taken into account. This corresponds to the standard GSC. Hence, the 
SDR-GSC encompasses the GSC as a special case. 
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[0054] The regularisation term (eq.72) with l///^0 adds robustness to the GSC, while 
not affecting the noise reduction performemce in the absence of speech leakage: 

■Bin the absence of speech leakage, i.e., = 0, / = -1 , the 

regularisation term equals 0 for all w/.a/-/ and hence the residual noise energy ef, 

is effectively minimised. In other words, in the absence of speech leakage, the 
GSC solution is obtained. 

■Bin the presence of speech leakage, i.e., yf[A:]^0, / = 1,...,M -1, speech 

distortion is explicitly taken into account in the optimisation criterion (eq.70) for 
the adaptive filter w/.a/-/, limiting speech distortion while reducing noise. The 
larger the amount of speech leakage, the more attention is paid to speech 
distortion. 

To limit speech distortion alternatively, a QIC is often imposed on the filter m^/.a/-/. In 
contrast to the SDR-GSC, the QIC acts irrespective of the amount of speech leakage 
y^[k] that is present. The constraint value fi^ in (eq.) has to be chosen based on the 
largest model errors that may occur. As a consequence, noise reduction performance is 
compromised even when no or very small model errors are present. Hence, the QIC is 
more conservative than the SDR-GSC, as will be shown in the experimental results. 

SP-SDW-MWF with filter Wn 

[0055] Since the SDW-MWF (eq.62) takes speech distortion explicitly into account 
in its optimisation criterion, an additional filter wq on the speech reference yQ[k] may 

be added. The SDW-MWF (eq.62) then solves the following more general optimisation 
criterion 



Wo:A/-i =argmin^^ 



(equation 73) 



where w^^_, =[w^ w^^.,] is given by (eq.62). 
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[0056] Again, // trades off speech distortion and noise reduction. For //=oo speech 
distortion is completely ignored, which results in a zero output signal. For ju=0 all 

emphasis is put on speech distortion such that the output signal is equal to the output of 
the fixed beamformer delayed by A samples. 

In addition, the observation can be made that in the absence of speech leakage, i.e., 
y'[k] = 0 , i=l ...,M-7, and for infinitely long filters W/, ...,M-7, the SP-SDW-MWF 

(with Wo) corresponds to a cascade of an SDR-GSC and an SDW single-channel WF 
(SDW-SWF) postfilter. In the presence of speech leakage, the SP-SDW-MWF (with wq) 
tries to preserve its performance: the SP-SDW-MWF then contains extra filtering 
operations that compensate for the performance degradation due to speech leakage. This 
is illustrated in Fig. 4. It can e.g. be proven that, for infinite filter lengths, the 
performance of the SP-SDW-MWF (with m) is not affected by microphone mismatch 
as long as the desired speech component at the output of the fixed beamformer A(z) 
remains unaltered. 

Experimental results 

[0057] The theoretical results are now illustrated by means of experimental results 
for a hearing aid application. First, the set-up and the performance measures used, are 
described. Next, the impact of the different parameter settings of the SP-SDW-MWF on 
the performance and the sensitivity to signal model errors is evaluated. Comparison is 
made with the QIC-GSC. 

[0058] Fig. 5 depicts the set-up for the experiments. A three-microphone Behind- 
The-Ear (BTE) hearing aid with three omnidirectional microphones (Knowles FG- 
3452 ) has been mounted on a dummy head in an office room. The interspacing between 
the first and the second microphone is about 1 cm and the interspacing between the 
second and the third microphone is about 1.5 cm. The reverberation time T^odB of the 
room is about 700 ms for a speech weighted noise. The desired speech signal and the 
noise signals are uncorrelated. Both the speech and the noise signal have a level of 70 
dB SPL at the centre of the head. The desired speech source and noise sources are 
positioned at a distance of 1 meter fi-om the head: the speech source in fi-ont of the head 
(0^), the noise sources at an angle 9 w.r.t. the speech source (see also Fig. 5). To get an 
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idea of the average performance based on directivity only, stationary speech and noise 
signals with the same, average long-term power spectral density are used. The total 
duration of the input signal is 10 seconds of which 5 seconds contain noise only and 5 
seconds contain both the speech and the noise signal. For evaluation purposes, the 
speech and the noise signal have been recorded separately. 

[0059] The microphone signals are pre-whitened prior to processing to improve 
intelligibility, and the output is accordingly de-whitened. In the experiments, the 
microphones have been calibrated by means of recordings of an anechoic speech 
weighted noise signal positioned at 0^, measured while the microphone array is 
mounted on the head. A delay-and-sum beamformer is used as a fixed beamformer, 
since -in case of small microphone interspacing - it is known to be very robust to model 
errors. The blocking matrix B pairwise subtracts the time aligned calibrated microphone 
signals. 

[0060] To investigate the effect of the different parameter settings (i.e. //, wq) on the 
performance, the filter coefficients are computed using (eq.62) where ^{yai»/-iyS;j^-i} is 
estimated by means of the clean speech contributions of the microphone signals. In 
practice, E{yl:M.iyo.^^i} is approximated using (eq.). The effect of the approximation 
(eq.) on the performance was found to be small (i.e. differences of at most 0,5 dB in 
intelligibility weighted SNR improvement) for the given data set. The QIC-GSC is 
implemented using variable loading RLS. The filter length L per channel equals 96. 

[0061] To assess the performance of the different approaches, the broadband 
intelligibility weighted SNR improvement is used, defined as 

ASNRinteiiig = S A (SNR/,out-SNR/,in), (equation 74) 

where the band importance fiinction /, expresses the importance of the /-th one-third 
octave band with centre firequency f/" for intelligibility, SNR^out is the output SNR (in 
dB) and SNRi,in is the input SNR (in dB) in the i-th one third octave band ('ANSI 33,5- 
1997, American National Standard Methods for Calculation of the Speech Intelligibility 
Index*). The intelligibility weighted SNR reflects how much intelligibility is improved 
by the noise reduction algorithm, but does not take into account speech distortion. 
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[0062] To measure the amount of speech distortion, we define the following 
intelligibility weighted spectral distortion measure 

SDinteiiig = S A SDi (equation 75) 

with SD/ the average spectral distortion (dB) in /-th one-third band, measured as 

Sa- = |l01og,o G^(/)|#/[(2^^^ -2-«^*^)y;^], (equation 76) 

with Cj(f) the power transfer function of speech from the input to the output of the noise 
reduction algorithm. To exclude the effect of the spatial pre-processor, the performance 
measures are calculated w.r.t. the output of the fixed beamformer. 

[0063] The impact of the different parameter settings for ix and wq on the 
performance of the SP-SDW-MWF is illustrated for a five noise source scenario. The 
five noise sources are positioned at angles 75°, 120°, 180°, 240°, 285° w.r.t. the desired 
source at 0°. To assess the sensitivity of the algorithm against errors in the assumed 
signal model, the influence of microphone mismatch, e.g., gain mismatch of the second 
microphone, on the performance is evaluated. Among the different possible signal 
model errors, microphone mismatch was found to be especially harmfiil to the 
performance of the GSC in a hearing aid application. In hearing aids, microphones are 
rarely matched in gain and phase. Gain and phase differences between microphone 
characteristics of up to 6 dB and 10°, respectively, have been reported. 

SP-SDW-MWF without wn (SDR-GSC) 

[0064] Fig. 6 plots the improvement ASNRimeiiig and the speech distortion SDinteiiig as 
a function of 1/// obtained by the SDR-GSC (i.e., the SP-SDW-MWF without filter wq) 
for different gain mismatches at the second microphone. In the absence of 

microphone mismatch, the amount of speech leakage into the noise references is 
limited. Hence, the amount of speech distortion is low for all //. Since there is still a 
small amount of speech leakage due to reverberation, the amount of noise reduction and 
speech distortion slightly decreases for increasing 1///, especially for 1/// > 1. In the 
presence of microphone mismatch, the amount of speech leakage into the noise 
references grows. For l///=0 (GSC), the speech gets significantly distorted. Due to the 
cancellation of the desired signal, also the improvement ASNRinteiHg degrades. Setting 
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l///>0 improves the performance of the GSC in the presence of model errors without 
compromising performance in the absence of signal model errors. For the given set-up, 
a value 1/// around 0.5 seems appropriate for guaranteeing good performance for a gain 
mismatch up to 4dB. 

SP-SDW-MWF with filter wq 

[0065] Fig. 7 plots the performance measures ASNRinteiUg and SDintdiig of the SP- 
SDW-MWF with filter wq. hi general, the amount of speech distortion and noise 
reduction grows for decreasing 1///. For l///=0, all emphasis is put on noise reduction. 
As also illustrated by Fig. 7, this results in a total cancellation of the speech and the 
noise signed and hence degraded performance. In the absence of model errors, the 
settings Lo=0 and result - except for l/fi'^O - in the same ASNRinteiug, while the 
distortion for the SP-SDW-MWF with wo is higher due to the additional single-channel 
SDW-SWF. For L#0 the performance does -in contrast to Lo=0' not degrade due to the 
microphone mismatch. 

[0066] Fig. 8 depicts the improvement ASNRinteiHg and the speech distortion SDinteiugj 
respectively, of the QIC-GSC as a fiinction of p^. Like the SDR-GSC, the QIC increases 
the robustness of the GSC. The QIC is independent of the amount of speech leakage. As 
a consequence, distortion grows fast with increasing gain mismatch. The constraint 
value P should be chosen such that the maximum allowable speech distortion level is 
not exceeded for the largest possible model errors. Obviously, this goes at the expense 
of reduced noise reduction for small model errors. The SDR-GSC on the other hand, 
keeps the speech distortion limited for all model errors (see Fig. 6). Emphasis on speech 
distortion is increased if the amount of speech leakage grows. As a result, a better noise 
reduction performance is obtained for small model errors, while guaranteeing sufficient 
robustness for large model errors. In addition. Fig. 7 demonstrates that an additional 
filter Wo significantly improves the performance in the presence of signal model errors. 

[0067] In the previously discussed embodiments a generalised noise reduction 
scheme has been established, referred to as Spatially pre-processed. Speech Distortion 
Weighted Multi-channel Wiener Filter (SP-SDW-MWF), that comprises a fixed, spatial 
pre-processor and an adaptive stage that is based on a SDW-MWF. The new scheme 
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encompasses the GSC and MWF as special cases. In addition, it allows for an in- 
between solution that can be interpreted as a Speech Distortion Regularised GSC (SDR- 
GSC). Depending on the setting of a trade-off parameter fi and the presence or absence 
of the filter wo on the speech reference, the GSC, the SDR-GSC or a (SDW-)MWF is 
obtained. 

The different parameter settings of the SP-SDW-MWF can be interpreted as follows: 

• Without Wo, the SP-SDW-MWF corresponds to an SDR-GSC: the 
ANC design criterion is supplemented with a regularisation term that limits the 
speech distortion due to signal model errors. The larger 1///, the smaller the 
amount of distortion. For l///=0, distortion is completely ignored, which 
corresponds to the GSC-solution. The SDR-GSC is then an altemative technique 
to the QIC-GSC to decrease the sensitivity of the GSC to signal model errors. In 
contrast to the QIC-GSC, the SDR-GSC shifts emphasis towards speech 
distortion when the amount of speech leakage grows. In the absence of signal 
model errors, the performance of the GSC is preserved. As a result, a better 
noise reduction performance is obtained for small model errors, while 
guaranteeing robustness against large model errors. 

• Since the SP-SDW-MWF takes speech distortion explicitly into 
account, a filter wo on the speech reference can be added. It can be shown that - 
in the absence of speech leakage and for infinitely long filter lengths- the SP- 
SDW-MWF corresponds to a cascade of an SDR-GSC with an SDW-SWF 
postfilter. In the presence of speech leakage, the SP-SDW-MWF with wo tries to 
preserve its performance: the SP-SDW-MWF then contains extra filtering 
operations that compensate for the performance degradation due to speech 
leakage. In contrast to the SDR-GSC (and thus also the GSC), the performance 
does not degrade due to microphone mismatch. 

Experimental results for a hearing aid application confirm the theoretical results. The 
SP-SDW-MWF indeed increases the robustness of the GSC against signal model errors. 
A comparison with the widely studied QIC-GSC demonstrates that the SP-SDW-MWF 
achieves a better noise reduction performance for a given maximum allowable speech 
distortion level. 
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Stochastic gradient implementations 

[0068] Recursive implementations of the (SDW-)MWF have been proposed based 
on a GSVD or QR decomposition. Additionally, a subband implementation results in 
improved intelligibility at a significantly lower cost compared to the fullband approach. 
These techniques can be extended to implement the SP-SDW-MWF. However, in 
contrast to the GSC and the QIC-GSC, no cheap stochastic gradient based 
implementation of the SP-SDW-MWF is available. In the present invention, time- 
domain and frequency-domain stochastic gradient implementations of the SP-SDW- 
MWF are proposed that preserve the benefit of matrix-based SP-SDW-MWF over QIC- 
GSC. Experimental results demonstrate that the proposed stochastic gradient 
implementations of the SP-SDW-MWF outperform the SPA, while their computational 
cost is limited. 

[0069] Starting from the cost fiinction of the SP-SDW-MWF, a time-domain 
stochastic gradient algorithm is derived. To increase the convergence speed and reduce 
the computational complexity, the stochastic gradient algorithm is implemented in the 
frequency-domain. Since the stochastic gradient algorithm suffers from a large excess 
error when applied in highly time-varying noise scenarios, the performance is improved 
by applying a low pass filter to the part of the gradient estimate that limits speech 
distortion. The low pass filter avoids a highly time-varying distortion of the desired 
speech component while not degrading the tracking performance needed in time- 
varying noise scenarios. Next, the performance of the different frequency-domain 
stochastic gradient algorithms is compared. Experimental results show that the proposed 
stochastic gradient algorithm preserves the benefit of the SP-SDW-MWF over the QIC- 
GSC. Finally, it is shown that the memory cost of the frequency-domain stochastic 
gradient algorithm with low pass filter is reduced by approximating the regularisation 
term in the frequency-domain using (diagonal) correlation matrices instead of data 
buffers. Experiments show that the stochastic gradient algorithm using correlation 
matrices has the seime performance as the stochastic gradient algorithm with low pass 
filter. 
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Stochastic gradient algorithm 



Derivation 

[0070] A stochastic gradient algorithm approximates the steepest descent algorithm, 
using an instantaneous gradient estimate. Given the cost function (eq.67), the steepest 
descent algorithm iterates as follows (note that in the sequel the subscripts O.M-l in the 
adaptive filter wo:m-i and the input vector yo:M-i are omitted for the sake of conciseness): 



w[« + l] = wM + ^l ^ 



w=w[«] 



= w[n] + p 



>" ) 



(equation 77) 

with w[A:], y\k] g C^^*"* , where N denotes the number of input channels to the adaptive 
filter and L the number of filter taps per channel. Replacing the iteration index w by a 
time index k and leaving out the expectation values E{J^ one obtains the following 
update equation 



w[A: + l] = w[A:] + /? 



s ^ / 

r[ft] 



(equation 78) 

For l///=0 and no filter wo on the speech reference, (eq.78) reduces to the update 
formula used in GSC during periods of noise only (i.e., when 
yi\k] = y"[k\ / = 0,...,M-1). The additional term r[k] in the gradient estimate limits 
the speech distortion due to possible signal model errors. 

[0071] Equation (78) requires knowledge of the correlation matrix y^[A:] y'''"{k} or 
E{y'{k]y''"[k]) of the clean speech. In practice, this information is not available. To 
avoid the need for calibration, speech + noise signal vectors y^^^ are stored into a 

circular buffer B, € R^""^' during processing. During periods of noise only (i.e., when 
yXk] - y'l[k\ i^O, ...,M-7), the filter w is updated using the following approximation of 
the term r[k] = jjy'[k]y'^''[k]yv[k] in(eq,78) 
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-yY"[kMk]«^y^,fjZ^,[k]-yy"[k])yv[k], (equation 79) 
which resuhs in the update formula 



w[/[: + l] = w[/:] + p 



y[k]iy; [A: - A] - y [kMk]) - - (y,„^^ [^]y^^^ [k] - y[^]y [k])yv[k] 



r[k] 

(equation 80) 



In the sequel, a normalised step size p is used, i.e. 



P = —r-7: 77^ i , (equation 81) 

i \y"„r, [^]y.„/. [k] - y " [^]y[A:]| + y [k]y[k] + S 

where 8 is a small positive constant. The absolute value ly^^^y^jj^^ -y'^yj has been 
inserted to guarantee a positive valued estimate of the clean speech energy 
y^^C^ly^C^] • Additional storage of noise only vectors y^,^^ in a second buffer 

e R^""^^'^^ allows to adapt w also during periods of speech + noise, using 
y^Vk + \] = y,[k] + p\^,^^Xk]{yl^^^^^ 

(equation 82) 

with 

p' 

p - I „ z \ z • (equation 83) 

i |y " [^]y[^] - y^^, [^ly^,,, [^]| + y^^, [k]y,^^ [k] + s 

For reasons of conciseness only the update procedure of the time-domain stochastic 
gradient algorithms during noise only will be considered in the sequel, hence y{k'\= 
y^k]. The extension towards updating during speech + noise periods with the use of a 
second, noise only buffer B2 is straightforward: the equations are found by replacing the 
noise-only input vector by y^„^^ [A:] and the speech + noise vector y^„^ [k] by the 
input speech + noise vector j7[ A:]. 

It can be shown that the algorithm (eq.80)-(eq.81) is convergent in the mean provided 
that the step size p is smaller than l/Amax with Amax the maximum eigenvalue of 
^{iy^ii/iy^f/i +(l-i)yy^} • The similarity of (eq.80) with standard NLMS let us 
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presume that setting p < , with A,, i=I,..,,NL the eigenvalues of 

E{jjytuA +0 "i)yy''} ^ , or -in case of FIR fiUers- setting 

2 

P < 1 , (equation 84) 

guarantees convergence in the mean square. Equation (84) explains the normalisation 
(eq.52) and (eq.83) for the step size p. 

10072] However, since generally 

y[k]y"[k] ^ y^„^ [^]y;;;^ [kl (equation 85) 

the instantaneous gradient estimate in (eq.80) is -compared to (eq.78)- additionally 
perturbed by 

-(yL^ly^'E^l-yL/. Wy2;^OT)w[n (equation 86) 

for l/ju^O. Hence, for l///v^O, the update equations (eq.80)-(eq.54) suffer from a larger 
residual excess error than (eq.78). This additional excess error grows for decreasing 
increasing step size p and increasing vector length LN of the vector y. It is expected to 
be especially large for highly non-stationary noise, e.g. multi-talker babble noise. 
Remark that for ju>l, an alternative stochastic gradient algorithm can be derived from 
algorithm (eq.80)-(eq.54) by invoking some independence assumptions. Simulations, 
however, showed that these independence assumptions result in a significant 
performance degradation, while hardly reducing the computational complexity. 



Frequency-domain implementation 



[0073] As stated before, the stochastic gradient algorithm (eq.80)-(eq.54) is expected 
to suffer from a large excess error for large pV// and/or highly time-varying noise, due 
to a large difference between the rank-one noise correlation matrices y"[k]y"'"[k] 
measured at different time instants k. The gradient estimate can be improved by 
replacing 

y.«/,[^]y^^[^]-y[^]y''[^] (equation 87) 

in (eq.80) with the time-average 
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^ Z y>«d^K<r,U]~ i; y[/]y"[a (equation 88) 

where i^X/ljt-^^+i y^'f/if'^^^f^i'^^^ updated during periods of speech + noise and 
■^SL-AT+iyt'ly'^t'l during periods of noise only. However, this would require 
expensive matrix operations. A block-based implementation intrinsically performs this 



averaging: 



K 



Z y[kK + i][yl[kK + i-^]-y"[kK + i]w[kK]) 



1=0 



(equation 89) 

The gradient and hence also yft„yj[A:]y^^j[A:]-y[A:]y''[/r] is averaged over K iterations 

prior to making adjustments to w. This goes at the expense of a reduced (i.e. by a factor 
K) convergence rate. 

[0074] The block-based implementation is computationally more efficient when it is 
implemented in the frequency-domain, especially for large filter lengths : the linear 
convolutions and correlations can then be efficiently realised by FFT algorithms based 
on overlap-save or overlap-add. In addition, in a fi-equency-domain implementation, 
each fi-equency bin gets its own step size, resulting in faster convergence compared to a 
time-domain implementation while not degrading the steady-state excess MSE. 

[0075] Algorithm 1 summarises a fi-equency-domain implementation based on 
overlap-save of (eq.80)-(eq.54). Algorithm 1 requires (3N+4) FFTs of length 2L. By 
storing the FFT-transformed speech + noise and noise only vectors in the buffers 

B, G C^''^''"^' and B2 g C^""^"""^^ ^ respectively, instead of storing the time-domain vectors, 

FFT operations can be saved. Note that since the input signals are real, half of the 
FFT components are complex-conjugated. Hence, in practice only half of the complex 
FFT components have to be stored in memory. When adapting during speech + noise, 
also the time-domain vector 

[yo[kL-/S.] y^ikL-^ + L-YlY (equation 90) 
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should be stored in £in additional buffer q e R ^ during periods of noise-only, 

which -for N=M- results in an additional storage of words compared to when the 
time-domain vectors are stored into the buffers Bi and B2. 

Remark that in Algorithm 1 a common trade-off parameter ju is used in all frequency 
bins. Alternatively, a different setting for /i can be used in different frequency bins. E.g. 
for SP-SDW-MWF with wo=0, could be set to 0 at those frequencies where the 
GSC is sufficiently robust, e.g., for small-sized arrays at high frequencies. In that case, 
only a few frequency components of the regularisation terms Ri[k], i=M-N,.,.,M'l, need 
to be computed, reducing the computational complexity. 
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Algorithm 1: Frequency-domain stochastic gradient SP-SDW-MWF based on overlap- 
save 

Initialisation: 

\V,.[0] = [0 •• of, i = M-N,.,.,M-l 
PJO] = ^„, m = 0....,2£-l 

Matrix definitions: 



g = 



h 0, 
0. 0, 



;k = [0^ I^]; F = 21, x2Z, DPT matrix; 



For eacii new block of NL input samples: 

♦ If noise detected: 

1. ¥[y.[kL-L] ... 7,.[A;Z; + Z-l]f, / = M-A^,...,M-l^noisebufferB2 
[>'o[M,-A] ... >'o[A£-A + Z,-l]f-> noise buffer 

2. Y';[k] = d{aig[¥{y.[_kL-L\ ... yXkL + L-\]Y],i = M -N,...,M -\ 

dlik^ = [y,[kL-A-\ ... y^[kL-^ + L-\■\\ 

Create Yi[k] from data in speech + noise buffer Bi. 

♦ If speech detected: 

1. ¥[yXkL-L] ... >',[^ + £-l]]^i=M-iV^,...,M-l^ speech -i-noise buffer B, 
2. X[A:] = diag{F[>;,[^-L] ... j;,[A£ + Z,-l]f},/ = M-A^,...,M-1 

Create d[A:] and Yi"[A:] from noise buffer B2,o and B2 

♦ Update formula: 

1. e,[^] = RF-Eti^ y;[^]W,[^] = y„„,, 

e[>t] = d[A:]-e,[)fc] 

= kF-'SZlAT Y,[A:]W,[^] = y„,., 
E,[A:] = Fk'e,[A:];E2[^] = Fk'e,[/:] ; E[^] = Fk^e[A:] 

2. A[^] = ^diag{/r'W,..../'2t,[A:]} 
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3. W,[A: + 1] = W,l^] + FgF- W/r] { Y,-^ 

(i^M'N, ... , M-1) 

♦ Output: yo[/:] = [yo[^- A] ... j^^C^- A + L-l]f 

• If noise detected: y„„,[/r] = y^[k] - y^^,^^[k] 

• If speech detected: y^^^k] = yo[k'\-y^^,^2[k] 

Improvement 1 : stochastic gradient algorithm with low pass filter 

[0076] For spectrally stationary noise, the limited (i.e. K=L) averaging of (eq.88) by 
the block-based and frequency-domain stochastic gradient implementation may offer a 
reasonable estimate of the short-term speech correlation matrix E{y^y^'"} . However, in 
practical scenarios, the speech and the noise signals are often spectrally highly non- 
stationary (e.g. multi-talker babble noise) while their long-term spectral and spatial 
characteristics (e.g. the positions of the sources) usually vary more slowly in time. For 
these scenarios, a reliable estimate of the long-term speech correlation matrix 
E{y^y'^") that captures the spatial rather than the short-term spectral characteristics can 
still be obtained by averaging (eq.88) over K»L samples. Spectrally highly non- 
stationary noise can then still be spatially suppressed by using an estimate of the long- 
term speech correlation matrix in the regularisation term r\K\. A cheap method to 
incorporate a long-term averaging (K»L) of (eq.88) in the stochastic gradient 
algorithm is now proposed, by low pass filtering the part of the gradient estimate that 
takes speech distortion into account (i.e. the term r[k] in (eq.80)). The averaging method 
is first explained for the time-domain algorithm (eq.80)-(eq.54) and then translated to 
the frequency-domain implementation. 

Assume that the long-term spectral and spatial characteristics of the noise are quasi- 
stationary during at least K speech + noise samples and K noise samples. A reliable 
estimate of the long-term speech correlation matrix ^^{y^y^ ^'} is then obtained by 
(eq.88) with K»L. To avoid expensive matrix computations, r[k'\ can be approximated 
by 

77 Z (yi,.^my^,m-ymy"m)w[/]. (equationpi) 
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Since the filter coefficients w of a stochastic gradient algorithm vary slowly in time, 

(eq.91) appeeirs a good approximation of r[k], especially for small step size p\ 

The averaging operation (eq.91) is performed by applying a low pass filter to r[k] in 

(eq.80): 

r[^] = Ar[A:-l] + (l-i)l(y^,^,[/:K^^ (equation 92) 

where A < 1 . This corresponds to an averaging window K of about ^ samples. The 
normalised step size p is modified into 

p = 77 (equation 93) 

ra.,[k] = ^r„.,[k-l] + il-X)^^^^^^^^ (equation 94) 

Compared to (eq.80), (eq.92) requires SNL-l additional MAC and extra storage of the 
NLxl vector r[k]. 

[0077] Equation (92) can be easily extended to the fi-equency-domain. The update 
equation for W/[k+l] in Algorithm 1 then becomes (Algorithm 2): 

WXk + 1] = yVXk] + ¥gF-'A[k](Yr''[kmk] - R,[A:]); 

R, W = /IR,[A: - 1] + (1 - ^)-(y," [k]E,[k] - Yr"[k]E,[k]) 



(equation 95) 



with 



E[A:] = Fk 



T 



ySM-kF"' Z Y;[A:]W,[fc] 



(equation 96) 



E,[A:] = Fk''kF-* X Y;[A:]W.[A:]; (equation 97) 



E2[A:] = Fk^kF-' X Y,.[A:]W/A:]. (equation 98) 

J=M-N 

and/^[A:] computed as follows: 

A[A:] = ^diag{/>-'[A:],...,/>;'_,[^]} (equation 99) 
PJk] = yP^ - 1] + (1 - r ) (/^. JA:] + P,,„ [k]) (equation 1 00) 
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P2.Jk] = APUk-\] + (l-^)- 



^Uk]= S |Y;:„[A:f (equation 101) 

Z |y,,.[^]| -|Y;j/r]| 



j=M-N 



. (equation 102) 



Compared to Algorithm 1, (eq.95)-(eq.98) require one extra 2Z,-point FFT and 8NL-2N' 
2L extra MAC per L samples and additional memory storage of a 2NLxl real data 
vector. To obtain the same time constant in the averaging operation as in the time- 
domain version with K=l^ X should equal 

The experimental results that follow will show that the performance of the stochastic 
gradient algorithm is significantly improved by the low pass filter, especially for large 
L 

[0078] Now the computational complexity of the different stochastic gradient 
algorithms is discussed. Table 1 summarises the computational complexity (expressed 
as the number of real multiply-accimiulates (MAC), divisions (D), square roots (Sq) and 
absolute values (Abs)) of the time-domain (TD) and the fi"equency-domeiin (FD) 
Stochastic Gradient (SG) based algorithms. Comparison is made with standard NLMS 
and the NLMS based SPA. One complex multiplication is assumed to be equivalent to 4 
real multiplications and 2 real additions. A 2£-point FFT of a real input vector requires 
2Llog22L real MAC (assuming a radix-2 FFT algorithm). 

Table 1 indicates that the TD-SG algorithm without filter wo and the SPA are about 
twice as complex as the standard ANC. When applying a Low Pass filter (LP) to the 
regularisation term, the TD-SG algorithm has about three times the complexity of the 
ANC. The increase in complexity of the fi-equency-domain implementations is less. 
Algorithm update formula step size adaptation 

TD NLMS ANC (2M - 2)L + 1) MAC 1 D + (M - 1)L MAC 

NLMS based (4(M - l)L + 1) MAC+ 1 D+ 1 Sq 1 D -h (M - 1)L MAC 



FD 



SPA 

SG 
SG with LP 
NLMS ANC 



(4A^L + 5)MAC 
(7A^i: + 4)MAC 

(10M-7-^^) + 
(6M-2)log2 2L MAC 



lD + lAbs + (2iVL + 2)MAC 
lD + lAbs + (2iW: + 4)MAC 
1D + (2M + 2)MAC 
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NLMS based 
SPA 

SG 

(Algorithm 1) 
SG with LP 
(Algorithm 2) 



14M-11- 



4(A/-1) 



(6Af-2)log2 2LMAC 
+1/L Sq + l/ZD 

(18A^ + 6-^) + 
(6iV + 8)log2 2LMAC 

(26Ar + 4--4^) 

+(6iV + 10)log2 2LMAC 

Table 1 



lDH-(2Af + 2)MAC 

lD + lAbs + (4A/^ + 4)MAC 
ID + 1 Abs + (4N + 6) MAC 



[0079] As an illustration. Fig. 9 plots the complexity (expressed as the number of 
Mega operations per second (Mops)) of the time-domain and the frequency-domain 
stochastic gradient algorithm with LP filter as a function of L for M=3 and a sampling 
frequency fs^l6 kHz. Comparison is made with the NLMS-based ANC of the GSC and 
the SPA. The complexity of the FD SPA is not depicted, since for small Af, it is 
comparable to the cost of the FD-NLMS ANC. For L>8j the frequency-domain 
implementations result in a significantly lower complexity compared to their time- 
domain equivalents. The computational complexity of the FD stochastic gradient 
algorithm with LP is limited, making it a good altemative to the SPA for 
implementation in hearing aids. 

In Table 1 and Fig. 9 the complexity of the time-domain and the frequency-domain 
NLMS ANC and NLMS based SPA represents the complexity when the adaptive filter 
is only updated during noise only. If the adaptive filter is also updated during speech + 
noise using data from a noise buffer, the time-domain implementations additionally 
require NL MAC per sample and the frequency-domain implementations additionally 
require 2 FFT and (4L(M-l)-2(M-])+L) MAC per L samples. 

[0080] The performance of the different FD stochastic gradient implementations of 
the SP-SDW-MWF is evaluated based on experimental results for a hearing aid 
application. Comparison is made with the FD-NLMS based SPA. For a fair comparison, 
the FD-NLMS based SPA is -like the stochastic gradient algorithms- also adapted 
during speech + noise using data from a noise buffer. 



44 



Attv. Docket No. COCH>0185-US1 /Customer No. 22 > 506 Client Ref. No. CID 31 1 US 



[0081] The set-up is the same as described before (see also Fig. 5). The performance 
of the FD stochastic gradient algorithms is evaluated for a filter length L=32 taps per 
channel, p'=0.8 and y=0. To exclude the effect of the spatial pre-processor, the 
performance measures are calculated w.r.t. the output of the fixed beamformer. The 
sensitivity of the algorithms against errors in the assumed signal model is illustrated for 
microphone mismatch, e.g. a gain mismatch = 4 dB of the second microphone. 

[0082] Fig. 10(a) and (b) compare the performance of the different FD Stochastic 
Gradient (SG) SP-SDW-MWF algorithms without wq (i.e., the SDR-GSC) as a function 
of the trade-off parameter fi for a stationary and a non-stationary (e.g. multi-talker 
babble) noise source, respectively, at 90°. To analyse the impact of the approximation 
(eq.79) on the performance, the result of a FD implementation of (eq.78), which uses 
the clean speech, is depicted too. This algorithm is referred to as optimal FD-SG 
algorithm. Without Low Pass (LP) filter, the stochastic gradient algorithm achieves a 
worse performance than the optimal FD-SG algorithm (eq.78), especially for large \lfx. 
For a stationary speech-like noise source, the FD-SG algorithm does not suffer too 
much from approximation (eq.79). In a highly time-varying noise scenario, such as 
multi-talker babble, the limited averaging of r[k] in the FD implementation does not 
suffice to maintain the large noise reduction achieved by (eq.78). The loss in noise 
reduction performance could be reduced by decreasing the step size />', at the expense of 
a reduced convergence speed. Applying the low pass filter (eq.95) with e.g. ^=0.999 
significantly improves the performance for all 1///, while changes in the noise scenario 
can still be tracked. 

[0083] Fig. 11 plots the SNR improvement ASNRinteiiig and the speech distortion 
SDinteiiig of the SP-SDW-MWF (l///=0.5) with and without filter wq for the babble noise 
scenario as a function of where A is the exponential weighting factor of the LP filter 
(see (eq.95)). Performance clearly improves for increasing L For small A, the SP-SDW- 
MWF with Wo suffers fi-om a larger excess error -and hence worse ASNRinteiiig - 
compared to the SP-SDW-MWF without wq. This is due to the larger dimensions of 

[0084] The LP filter reduces fluctuations in the filter weights W,[A:] caused by poor 
estimates of the short-term speech correlation matrix E{^yt'^} and/or by the highly non- 
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stationary short-term speech spectrum. In contrast to a decrease in step size p\ the LP 
filter does not compromise tracking of changes in the noise scenario. As an illustration. 
Fig. 1 2 plots the convergence behaviour of the FD stochastic gradient algorithm without 
Wo (i.e. the SDR-GSC) for >l=0 and X=0,9998^ respectively, when the noise source 
position suddenly changes from 90° to 180°. A gain mismatch of 4 dB was applied 

to the second microphone. To avoid fast fluctuations in the residual noise energy el 
and the speech distortion energy e] , the desired and the interfering noise source in this 
experiment are stationary, speech-like. The upper figure depicts the residual noise 
energy ^r^ as a function of the number of input samples, the lower figure plots the 

residual speech distortion s] during speech + noise periods as a function of the number 

of speech + noise samples. Both algorithms (i.e., X=0 and 1=0.9998) have about the 
same convergence rate. When the change in position occurs, the algorithm with 
^=0,9998 even converges faster. For /l=0, the approximation error (eq.79) remains large 
for a while since the noise vectors in the buffer are not up to date. For k=0,9998^ the 
impact of the instantaneous large approximation error is reduced thanks to the low pass 
filter. 

[0085] Fig. 13 and Fig. 14 compare the performance of the FD stochastic gradient 
algorithm with LP filter {X=0,9998) and the FD-NLMS based SPA in a multiple noise 
source scenario. The noise scenario consists of 5 multi-talker babble noise sources 
positioned at angles 75°,120'',180°,240^,285'' w.r.t. the desired source at 0"*. To assess 
the sensitivity of the algorithms against errors in the assumed signal model, the 
influence of microphone mismatch, i.e. gain mismatch = 4 dB of the second 
microphone, on the performance is depicted too. In Fig. 13, the SNR improvement 
ASNRinteiiig and the speech distortion SDinteiiig of the SP-SDW-MWF with and without 
filter Wo is depicted as a function of the trade-off parameter Xlfi, Fig. 14 shows the 
performance of the QIC-GSC 

Yi"yv<p^ (equation 1 03) 

for different constraint values , which is implemented using the FD-NLMS based 
SPA. 
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The SPA and the stochastic gradient based SP-SDW-MWF both increase the robustness 
of the GSC (i.e., the SP-SDW-MWF without wq and l//^0). For a given maximum 
allowable speech distortion SDimeiiig, the SP-SDW-MWF with and without wq achieve a 
better noise reduction performance than the SPA. The performance of the SP-SDW- 
MWF with Wo is -in contrast to the SP-SDW-MWF without wq- not affected by 
microphone mismatch. In the absence of model errors, the SP-SDW-MWF with wo 
achieves a slightly worse performance than the SP-SDW-MWF without wq. This can be 
explained by the fact that with wo, the estimate of ^£'{y^y^ ^} is less accurate due to 

the larger dimensions of jjEiy'y'''^} (see also Fig. 11). In conclusion, the proposed 

stochastic gradient implementation of the SP-SDW-MWF preserves the benefit of the 
SP-SDW-MWF over the QIC-GSC. 



Improvement 2 ; freauencv-domain stochastic gradient algorithm using 
correlation matrices 

[0086] It is now shown that by approximating the regularisation term in the 
frequency-domain, (diagonal) speech and noise correlation matrices can be used instead 
of data buffers, such that the memory usage is decreased drastically, while also the 
computational complexity is further reduced. Experimental results demonstrate that this 
approximation results in a small -positive or negative- performance difference 
compared to the stochastic gradient algorithm with low pass filter, such that the 
proposed algorithm preserves the robustness benefit of the SP-SDW^-MWF over the 
QIC-GSC, while both its computational complexity and memory usage are now 
comparable to the NLMS-based SPA for implementing the QIC-GSC. 

[0087] As the estimate of r[k] in (eq.80) proved to be quite poor, resulting in a large 
excess error, it was suggested in (eq. 88) to use an estimate of the average clean speech 
correlation matrix. This allows r[k] to be computed as 

rW = -(l->^)i:r-'(y..^J/]y^:j/]-yl%^^^^ (equation 104) 

with A an exponential weighting factor. For stationary noise a small ^, i.e. 
^^I^^^^p, sufBces. However, in practice the speech and the noise signals are often 
spectrally highly non-stationary (e.g. multi-talker babble noise), whereas their long-term 
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spectral and spatial characteristics usually vary more slowly in time. Spectrally highly 
non-stationary noise can still be spatially suppressed by using an estimate of the long- 
term correlation matrix in r[k], i.e. l^^^^^^^^- 

In order to avoid expensive matrix operations for computing (eq.l04), it was previously 
assumed that w[k] varies slowly in time, i.e. w[k]5=iw[l], such that (eq.l04) can be 
approximated with vector instead of matrix operations by directly applying a low pass 
filter to the regularisation term r[k], cf. (eq.92), 

r[k] = 1 (1 - i)X X'-' (vtuA U] - y " [/]y [/]) • w[/] (equation 1 05) 

= Xr[k-l] + il-A)^y,^^^^[k]y^^^^^^^ (equation 106) 

M 

However, this assumption is actually not required in a frequency-domain 
implementation, as will now be shown. 

[0088] The frequency-domain algorithm called Algorithm 2 requires large data 
buffers and hence the storage of a large amount of data (note that to achieve a good 
performance, typical values for the buffer lengths of the circular buffers Bi and B2 are 
10000... 20000). A substantial memory (and computational complexity) reduction can 
be achieved by the following two steps: 

• When using (eq.l04) instead of (eq.l06) for calculating the regularisation 
term, correlation matrices instead of data samples need to be stored. The 
frequency-domain implementation of the resulting algorithm is summarised in 
Algorithm 3, where 2Zjc2L-dimensional speech and noise correlation matrices 
Sy[k] and S"j[k]J,J = M-'N,,.M-l are used for calculating the regularisation 

term Ri[k] and (part of) the step size A[k]. These correlation matrices are 
updated respectively during speech + noise periods and noise only periods. 
When using correlation matrices, filter adaptation can only take place during 
noise only periods, since during speech + noise periods the desired signal cannot 
be constructed from the noise buffer B2 anymore. This first step however does 
not necessarily reduce the memory usage (NLbun for data buffers vs. 2(NLf for 
correlation matrices) and will even increase the computational complexity, since 
the correlation matrices are not diagonal. 

• The correlation matrices in the frequency-domain can be approximated 
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by diagonal matrices, since Fk^kF^ in Algorithm 3 can be well approximated 
by I2//2. Hence, the speech and the noise correlation matrices are updated as 
S [k] = AS^. [A: - 1] 4- (1 - X)Yl' [k]Yj [k]/2 , (equation 1 07) 

Sl[k] = ASl[k-l]-\-(l~X)Y;'"[k]YJ[k]/2, (equation 108) 

leading to a significant reduction in memory usage and computational 
complexity, while having a minimal impact on the performance and the 
robustness. This algorithm will be referred to as Algorithm 4. 
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Algorithm 3 Frequency-domain implementation with correlation matrices (without 
approximation) 

Initialisation and matrix definitions: 

\V,.[0] = [0 0f,/ = M-A/^...M-l 

^J0] = ^,,/w = 0...2£-l 

F = 2Z, X 2L -dimensional DFT matrix 

g = 

OL=LxL-dim. zero matrix, lL=LxL-dim. identity matrix 
For each new block of L samples (per channel): 

d[k] = [y,[kL^A] ... y,[kL-A + L-l]Y 

Y.[k] = dmg\F[y. [kL-'L] j;,.[A£ + L-l]f },/ = M -iV...M-l 
Output signal: 



, k = [0, I,] 



M-l 



e[A:] = d[A:]-kF-' £ Yj[k]W.[kl E[/^] = Fk^e[^] 

J=M-N 



If speech detected: 



Sy[k] = (1 - A)^ A*-'X''[/]Fk^kF-^Y^.[/] = AS.j[k - 1] + (1 - A)Yl'[k]¥k^kF-%[k] 



i=0 



If noise detected: Y^k^^Xik] 
S;[A:] = (1- A)X^''"'Y,"'''[/]Fk^kF-'Y;[/] = ;LS;[A:-l] + (l-/l)X"'''[A:]Fk^kF-'Y;[^] 



/=o 



Update formula (only during noise-only-periods): 



with 



1 M~\ 

R,M = - S [Sy[k]-S;[k]]Wj[kli = M-N...M-l 

j=M-N 

W.[k + 1] = \V,.[yt] + FgF''A[A:] { Y,"" [A;]E[A:] - R,[/:]} ,i = M-N...M-l 



A\_k] = ^diag{p;'[ki...,p;Uk]} 

= rPJk - 1] + (1 - r)(Pak] + P,Jk]),m = 0...2L-\ 
1 



Af-l 



J=M-N 



,m = 0...2L-\ 
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[0089] Table 2 summarises the computational complexity and the memory usage of 
the frequency-domain NLMS-based SPA for implementing the QIC-GSC and the 
frequency-domain stochastic gradient algorithms for implementing the SP-SDW-MWF 
(Algorithm 2 and Algorithm 4). The computational complexity is again expressed as the 
number of Mega operations per second (Mops), while the memory usage is expressed in 
kWords. The following parameters have been used: M=i, L=32, fs=16kHz, 
Lbufi= 10000, (a) N=M-1, (b) N—M. From this table the following conclusions can be 
drawn: 

• The computational complexity of the SP-SDW-MWF (Algorithm 2) with 
filter Wo is about twice the complexity of the QIC-GSC (and even less if the 
filter Wo is not used). The approximation of the regularisation term in Algorithm 
4 fiirther reduces the computational complexity. However, this only remains true 
for a small number of input channels, since the approximation introduces a 
quadratic term O(N^) . 

• Due to the storage of data samples in the circular speech + noise buffer 
Bi, the memory usage of the SP-SDW-MWF (Algorithm 2) is quite high in 
comparison with the QIC-GSC (depending on the size of the data buffer Lbun of 
course). By using the approximation of the regularisation term in Algorithm 4, 
the memory usage can be reduced drastically, since now diagonal correlation 
matrices instead of data buffers need to be stored. Note however that also for the 
memory usage a quadratic term 0{N^) is present. 



Algorithm 



NLMS based SPA 



SG with LP 
(Algorithm 2) 



Computational complexity 
update formula step size 



(14M-ll-^^^) + 
(6M-2)log2 2LMAC 
+l/LSq + l/ZD 



adaptation 

(2M + 2)MAC 
+ 1D 



Mops 



2.16 



(47yr+6)MAC 3.22*^\ 4.27 



(b) 



(6A^ + 10)log2 2i:MAC +lD + lAbs 
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SG with correlation (IQA^^ t 13A^ 4A^^+3yv -^ , (2A/' + 4)MAC 2Jl^^\ 4.31^^^ 
matrices i6N + 4) log2 2L MAC +1 D + 1 Abs 

(Algorithm 4) 

Memory usage kWords 

NLMS based SPA 4(M - \)L + 6L 0.45 

SG with LP 2NL„^^^ + 6LN + 1L 40.6 1^'> , 60,80^'^ 

(Algorithm 2) 

SG with correlation ALN^ +6LN + 1L 1.12^^^ 1.95^^^ 

matrices 
(Algorithm 4) 

Table 2 



[0090] It is now shown that practically no performance difference exists between 
Algorithm 2 and Algorithm 4, such that the SP-SDW-MWF using the implementation 
with (diagonal) correlation matrices still preserves its robustness benefit over the GSC 
(and the QIC-GSC). The same set-up has been used as for the previous experiments. 

The performance of the stochastic gradient algorithms in the frequency-domain is 
evaluated for a filter length L^32 per channel, p'=0.8, y=0.95 and X==0.9998, For all 
considered algorithms, filter adaptation only takes place during noise only periods. To 
exclude the effect of the spatial pre-processor, the performance measures are calculated 
with respect to the output of the fixed beamformer. The sensitivity of the algorithms 
against errors in the assumed signal model is illustrated for microphone mismatch, i.e. a 
gain mismatch = 4 dB at the second microphone. 

[0091] Fig. 15 and Fig. 16 depict the SNR improvement ASNRinteiiig and the speech 
distortion SDinteiiig of the SP-SDW-MWF (with wq) and the SDR-GSC (without wo), 
implemented using Algorithm 2 (solid line) and Algorithm 4 (dashed line), as a fimction 
of the trade-off parameter 1///. These figures also depict the effect of a gain mismatch 
= 4 dB at the second microphone. From these figures it can be observed that 
approximating the regularisation term in the frequency-domain only results in a small 
performance difference. For most scenarios the performance is even better (i.e. larger 
SNR improvement and smaller speech distortion) for Algorithm 4 than for Algorithm 2. 
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[0092] Hence, also when implementing the SP-SDW-MWF using the proposed 
Algorithm 4, it still preserves its robustness benefit over the GSC (and the QIC-GSC). 
E.g. it can be observed that the GSC (i.e. SDR-GSC with l//z=0) will result in a large 
speech distortion (and a smaller SNR improvement) when microphone mismatch 
occurs. Both the SDR-GSC and the SP-SDW-MWF add robustness to the GSC, i.e. the 
distortion decreases for increasing 1///. The performance of the SP-SDW-MWF (with 
Wo) is again hardly affected by microphone mismatch. 

[0093] All documents, patents, joumal articles and other materials cited in the 
present application are hereby incorporated by reference. 

[0094] Although the present invention has been fully described in conjunction with 
several embodiments thereof with reference to the accompanying drawings, it is to be 
understood that various changes and modifications may be apparent to those skilled in 
the art. Such changes and modifications are to be understood as included within the 
scope of the present invention as defined by the appended claims, unless they depart 
therefirom. 



53 



Attv. Docket No. COCH-01 85-US1 /Customer No. 2 2 . 5 0 6 Client Ref . No. CID 31 1 US 

ABSTRAC T ABSTRACT 
METHOD i\ND DEVICE FOR NOISE REDUCTION 

[0095] In one aspect of the ¥ he-present invention^ r e lat e s to a method to reduce noise 
in a noisy speech signal is disclosed? The method comprises oomprising th e st e ps of 
applying at least two versions of the noisy speech signal to a first filter, whereby that 
first filter outputs a speech reference signal and at least one noise reference signal, 
appljdng a filtering operation to each of the at least one noise reference signals, and 
subtracting ft-om the speech reference signal each of the filtered noise reference signals, 
ohoracterisod in that w herein t he filtering operation is performed with filters having 
filter coefficients determined by taking into account speech leakage contributions in the 
at least one noise reference signal. 



(Figur e 3) 
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